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The SANS Institute recently updated and expanded their Top 10 Most Critical 
Internet Security Threats List. The new list identifies the Top Twenty vulnerabili- 
ties and is divided into three categories: General, Windows, and UNIX vulnerabili- 
ties. The vulnerabilities identified are: 


General: 

G1 — Default Installs of Operating Systems and Applications 

G2 — Accounts with No Passwords or Weak Passwords 

G3 — Non-existent or Incomplete Backups 

G4 — Large Number of Open Ports 

G5 — Not Filtering Packets for Correct Incoming and Outgoing Addresses 
G6 — Non-existent or Incomplete Logging 

G7 — Vulnerable CGI Programs 


Windows: 

W1— Unicode Vulnerability (Web Server Folder Traversal) 
W2 — ISAPI Extension Buffer Overflows 

W3 — IIS RDS Exploit (Microsoft Remote Data Services) 
W4 — NETBIOS - Unprotected Windows Networking Shares 
W5 — Information Leakage via Null Session Connections 
W6 — Weak Hashing in SAM (LM hash) 


UNIX: 

U1 — Buffer Overflows in RPC Services 
U2 — Sendmail Vulnerabilities 

U3 — BIND Weaknesses 

U4 — R Commands 

U5 — Ipd Remote Print Protocol Daemon 
U6 — sadmind and mountd 

U7 — Default SNMP Strings 


Besides identifying weaknesses, the Top Twenty document provides instructions 
for correcting these system flaws. According to the SANS Web site, the list will be 
updated with additional information and vulnerabilities as they are identified. For each 
described vulnerability, the document now provides detailed information about the 
systems impacted, CVE entries (see cve.mitre.org for more information), how to 
determine whether your system is vulnerable, and how to protect against it. 

The list also provides an appendix specifying commonly probed ports. SANS 
cautions that blocking these ports is only a minimum requirement for perimeter 
security, not a comprehensive firewall specification list. Also, blocking these ports 
does not constitute a comprehensive security solution. According to the site, “Even 
if the ports are blocked, an attacker who has gained access to your network via 
other means (a dial-up modem, a Trojan email attachment, or a person who is an 
organization insider, for example) can exploit these ports if not properly secured on 
every host system in your organization.” 

For comprehensive information about the new Top Twenty list, please see the 
SANS Web site at: www. Sans .org. 
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SECURITY 


Duncan Napier 


fourth generation of Linux packet filtering implementa- 

tions. The first generation was Alan Cox’s port of BSD 
UNIX’s ipfw to Linux 1.1. Jos Vos and others extended this and 
added the ipfwadm user tool for manipulating the rules for filter- 
ing in the Linux 2.0 kernel. Paul “Rusty” Russell and Michael 
Neuling made some significant modifications to the 2.2 Linux 
kernel, and Russell added the user tool ipchains for controlling 
filtering rules for this kernel. Russell has now implemented a kernel 
framework called NetFilter. 

One of the goals of NetFilter was to provide a single, dedicated 
packet filter/mangler infrastructure that users and developers could 
deploy as an add-on built around the Linux kernel. For purposes 
of this article, packet filtering refers to the redirection of pack- 
ets (but not modification of packet headers), while mangling 
refers to packet modification, typically of the source and/or des- 
tination IP address. NetFilter was designed to be modular and 
extensible. IPTables is a module that plugs into the NetFilter 
framework and allows the user access to kernel filtering/man- 
gling rules and commands. If you are familiar with ipchains, 
you will notice the similarity between the syntax and format of 
IPTables and ipchains. 

It is also worth noting that NetFilter is outside of the standard 
Berkeley socket interface and as a result is, at the time of writing, 
restricted to the Linux OS. 


ik IPTables/NetFilter application is considered to be the 
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The official NetFilter home page is: 


http://netfilter.samba.org/ 


and it provides the latest documentation, information, 
patches, and releases related to the NetFilter Project. It 
is critical to stay up to date because [PTables and 
NetFilter have been undergoing a rapid proliferation 
of features. Also, if you choose to implement IPTables 
on a secure production environment, be careful which 
features you choose to enable, because some features, 
as noted in the installation, are experimental and have 
not been fully tested. 


Some Features and Enhancements 
Over ipchains and ipadm 


Here are some of the firewalling features of 
IPTables: 


¢ Stateful Inspection — Perhaps the most talked- 

about enhancement that NetFilter/IPTables adds 

to Linux is stateful packet inspection. Stateless 

firewalls (such as those using Linux ipchains) 

check and filter each packet individually. 

Stateless filters, for example, can differentiate 

between packets that are requesting a connection 
and those that are already connected in a session by checking 
whether the SYN flag in the packet header is set (and also 
whether FIN and ACK flags cleared). A stateless firewall has 
no recollection of the past history of the connection (or lack 
of it) and therefore makes filtering choices based on the 
packet header information presented to it by the packet itself. 
Stateful firewalls log information related to an entire session, 
such as source and destination addresses, port numbers, status 
information from header flags (e.g., SYN, ACK, FIN flags), 
and TCP sequencing information. This allows stateful systems to 
make filtering decisions in the context of an entire session 
rather than that of an individual packet and its header infor- 
mation. For example, if you have clients behind a firewall 
that make Domain Name Server (DNS) queries to an external 
DNS server, the client initiates a query by connecting from 
one of its high-numbered ports to UDP port 53 on the server. 
The DNS server answers the query from UDP port 53 back to 
the high-numbered port on the client. On a conventional 
ipchains firewall, all high-numbered ports above 1023 must 
be opened to accommodate inbound connections. A stateful 
firewall can be configured to only accept incoming UDP 
packets with a source address that matches the destination 
address of previously outgoing packets. Therefore, the fire- 
wall will only accept query responses from DNS servers that 
match outgoing queries that the firewall has already seen. 
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IPTables/NetFilter stateful inspection can be customized and 

extended with connection tracker and NAT (network address 

translation) helper modules. 
¢ Enhanced Network Address Translation (NAT) — In addition to 
Source Network Address Translation (SNAT), which is usually 
used by two or more machines to share one public IP address 
using a proxy, IPTables also does Destination NAT (DNAT). 
DNAT modifies the destination address (as opposed to SNAT, 
which modifies the source IP address) and allows traffic to be 
redirected to a proxy. Ipchains did not do port redirection and 
required the installation of the ipmasqadm utility for such 
things as forwarding ports into internal hosts. DNAT in 
IPTables now handles this function. DNAT can also be used as a 
very basic load-balancing tool to handle traffic flow to a cluster 
of machines. Additional packet-mangling functions that modify 
source and destination IP addresses have also been added, 
including MIRROR, which reverses source and destination 
addresses for sending packets back to their sources. 
Enhanced packet inspection — All six TCP flags can be examined 
by IPTables, as opposed to just the SYN flag setting in ipchains. 
This allows a much finer level of control of packets that enter 
or leave the network. Matches based on a series of source or 
destination ports can be specified, instead of a single port or single 
range of ports. This greatly simplifies script maintenance. There is 
a match extension parameter, which can even filter based on the 
user id, group id, process id, and session id. 
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e MAC address filtering — Traffic can now be filtered at the MAC 
(Ethernet hardware or Media Access Control) address level. 

e Enhanced logging — As of the current release, there are eight 
levels of logging (debug, info, notice, warning, err, crit, alert, 
emerg) as well as the ability to preappend error messages with 
customized strings for unique identification. 

¢ Rate-Limited matching — IPTables can limit the rate at which the 
firewall handles connection requests. This feature can be used not 
only to protect against Denial-of-Service attacks (such as syn 
floods) and port scans, but can also limit the rate of logging for 
repetitive processes. 

¢ Type of Service (TOS) prioritization — Traffic can be selectively 
prioritized into separate queues. The type of service levels 
currently available are Minimize-Delay, Maximize-Throughput, 
Maximize-Reliability, Minimize-Cost, and Normal-Service. 


Installation and Configuration 

IPTables and NetFilter require a Linux kernel version 2.3.15 or 
greater. A back port to lower versions (2.2.x) of Linux has been 
suggested by its author, but at the time of writing, this has not been 
implemented. I have installed and tested NetFilter/IPTables 1.2.1a 
on a Red Hat Beta 2.4.1 distribution (Wolverine) and a generic 2.4.3 
kernel. NetFilter can be compiled into the kernel or run as a load- 
able module. For purposes of this discussion, I will assume that 
NetFilter is compiled into the kernel. 

The IPTables tool can be downloaded 
from the Official NetFilter Web site. At the 
time of writing, given the rapid pace of 
development of IPTables I recommend 
using the latest version (in my case the 
patched version 1.2.1a) containing the lat- 
est patches and bug fixes. Following the 
instructions contained in the distribution, 
we run the make with the switches to patch 
the kernel and interactively choose features 
and tweaks to apply. We then make the 
package as instructed. Next, we do a stan- 
dard recompile of the Linux kernel. You 
can consult the Linux kernel HOWTO 
(http://linuxdocs.org/HOWTOs/Kernel- 
HOWTO.html) for more information, but 
here are the basic steps. 

You can manually edit the file .config 
in the kernel source directory tree to 
include or exclude modules of (usually 
under /usr/src/linux or something sim- 
ilar) or run the kernel make interactively. 
From the command line, first cd into the 
top-level directory of the kernel source, 
and run the command-line interactive 
configuration: 


# cd /usr/src/linux 
# make config 


(or use make menuconfig, which uses 
ncurses, or make xconfig, in XWindow) 
and answer “Y” to CONFIG_IP_NF flags 
that you wish to compile into the kernel. 
The IPTables and associated functions 
can be loaded as modules or compiled into 
the kernel. To run the basic functional core of 
NetFilter, the kernel needs to be compiled 
with the options CONFIG_NETFILTER=Y 
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and CONFIG_IP_NF_IPTABLES=Y. Otherwise, you will have to 
load the modules either manually or through a script before 
accessing them. To access each of the numerous functions of 
IPTables, make sure that the appropriate features have first been 
compiled as modules or compiled into the kernel. The compilation 
of these modules are through the flags CONFIG_IP_NF_* in 
the .config file for the kernel source. For example, the flag 
CONFIG_IP_NF_NAT determines whether to enable Network Address 
Translation (NAT). The flag CONFIG_IP_NF_MATCH_MAC deter- 
mines whether MAC address matching is enabled. 

Rather than being implemented through standard ipfilter com- 
mands, some of the stateful inspection controls are included as 
separate loadable code that is called a “helper module”. For example, 


‘ou 
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ip_conntrack_ftp is the helper module that tracks FTP connections 
tracker for FTP and is controlled by the kernel compilation flag 
CONFIG_IP_NF_FTP. If you choose to track normal or active 
mode FTP connections, set CONFIG_IP_NF_FTP=Y, then the 
helper module will be statically compiled into the kernel. If you 
choose to compile the helper module as a loadable module, then set 
CONFIG_IP_NF_FTP=M, compile, run depmod on the library 
object code, then run: 


# /sbin/modprobe ip_conntrack_ftp 


to load the connection tracker. Note that statically compiling or load- 

ing multiple connection trackers (for example CONFIG_IP_NF_FTP 

and CONFIG_IP_NF_IRC, for Internet Relay Chat service) will 
cause rule and compilation conflicts and 
should be avoided. Information on writing 
and extending connection tracking and NAT 
(Network Address Translation) modules can 
be found at: 


http://www.gnumonks.org/ftp/pub/doc/ \ 
conntracktnat.html 


I suggest, for simplicity, statically compiling 
the modules into the code if you are configur- 
ing IPTables/NetFilter for the first time. 

Once you have finished setting all the 
configuration parameters for your new 
kernel, run: 


# make dep 
# make bzImage 


to generate the kernel boot image. 
Copy the resulting boot image to the 
/boot partition: 


# cp /usr/src/linux/arch/i386/boot/bzImage \ 
# /boot/vmlinuz-iptables-1.2.1a 


Edit the boot 
/etc/lilo.conf: 


loader config file 


boot=/dev/hda 
map=/boot/map 
install=/boot/boot.b 
prompt 
timeout=100 
jimage=/boot/vmlinuz-iptables-1.2.1a 
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Jabel=1inux-iptables 
root=/dev/hdal 

read-only 
jmage=/boot/vmlinuz-2.4.x.x 
label=1inux 

root=/dev/hdal 

read-only 


Rerun lilo and you should see: 


linux-iptables * 
linux 


You should now have the option of booting 
into the kernel “linux-iptables” when you 
reboot your machine. 


Setting Up IPTables Firewalling 

Once you reboot your machine into 
the new kernel, you can try running 
IPTables commands. This would typically 
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be automated on startup through an rc.d startup script. There are 
numerous freely available IPTables firewall scripts in the public 
domain. A search under “IPTables firewall script” in any Internet 
search engine will list numerous options. I cannot vouch for each 
and every script that exists in the public domain, but many will 
provide an excellent starting point for building your firewall. 

IPTables syntax is very similar to that of ipchains. Like ipchains, 
IPTables also contains built-in as well as user-defined lists of rules 
(the chains), which each packet must first traverse. A match with the 
rule (the matching packet is called the “target’?) causes an action 
(e.g., ACCEPT, DROP) or a jump to a user-defined chain. 


Syntax and Use of IPTables 

A comprehensive documentation of [PTables/NetFilter is avail- 
able online (see references below) and additional documentation 
can be found in the IPTables man pages. 

IPTables has the following options to manage whole chains: 


-N — Create a new chain. 

-X — Delete an empty chain. 

-P — Change the policy for a built-in chain. 
-L — List the rules in a chain. 

-F — Flush the rules out of a chain. 

-Z — Zero the packet and byte counters. 


The following are ways to manipulate rules inside the chain: 


-A — Append a new rule. 
-] — Insert a new rule. 
-R — Replace a rule. 

-D — Delete a rule. 
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A simple script with comments written by Oskar Andreasson 
of BoingWorld.com is shown in Listing 1. The script shows the 
implementation of some simple IP forwarding, masquerading 
(NAT) spoofing checks of non-routable addresses (RFC1918), 
and the opening of some ports to allow access. If you are familiar 
with ipchains, you will probably have little difficulty in following 
most of the rules. 

The command: 


iptables -P <chain name> <policy> 


sets the default policy for the chain, either ACCEPT or DROP 
(DENY). Only built-in chains INPUT, OUTPUT, and FORWARD) 
have policies. Packet-mangling activities that modify the pack- 
ets in transit (such as NAT and proxying) use two additional 
predefined chains, PREROUTE (usually for DNAT), and 
POSTROUTE (usually for SNAT). The names are descriptive 
enough and convey the fact that DNAT destination addresses are 
typically rewritten before other chain rules are applied, and 
SNAT source addresses are rewritten after they have traversed 
the rule chains. 
Rules are added to chains using the following syntax: 


iptables -A <chain name> <match condition> -j <jump> 


The condition is a logical match of, for example, the following: 


-s <source IP> 

-d <destination IP> 

-sport <source port> 

-dport <destination port> 

-p <protocol tcp, udp, or icmp> 

-m <match, e.g., MAC address or TCP state> 
-owner <user/group/process/session id> 


The above match conditions are meant to be extensible, and numer- 
ous other extensions exist. To learn more about a match extension, 
use the -h option. For example, for the ICMP protocol (-p), type: 


#/sbin/iptables -p icmp -h 


and a list of the three dozen or so -icmp-type extensions will be dis- 
played. 

The jump (or “judgement”) following the -j is one of the 
following: 


ACCEPT — Accept the packet. 

DROP — Drop the packet. 

REJECT — Drop the packet and respond with “port-unreachable” 
ICMP packet. 

NAT — Rewrite the packets source or destination address. 

LOG — Log to the kernel logging daemon klogd. 

TOS — Impose a type/level of service. 


Example Applications of Some Rules 
Here are some example rulesets for the features described in the 
“Enhancements” section: 


1. Stateful Inspection — The rule: 


/sbin/iptables -A FORWARD -m state --state \ 
ESTABLISHED,RELATED -j ACCEPT 


forwards packets across the firewall that are part of a pre- 
existing connection. Besides ESTABLISHED (packet is part of 
existing connection) and RELATED (packet is related to exist- 
ing connection and passing in same direction), other defined 
states are NEW (packet is trying to create a new connection), 
INVALID (packet doesn’t match any exisitng connection), and 
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RELATED+REPLY (packet is not part of an existing connection, 
but is related to one (e.g., ftp-data transfer requested following an 
existing ftp-control session). 

2. DNAT — To redirect traffic from the 10.0.0.0 network to the Web 
server 10.0.1.1 to the Web server 10.0.1.100, use the rule: 


# /sbin/iptables -t nat -A POSTROUTING -s 10.0 ry =d \ 


10.0.1.1 -p tcp --dport 80 -j DNAT --to 10.0 


You can implement a load-balancing solution and redirect all 
incoming traffic to port 8080 on a group of servers IP 10.0.1.100- 
10.0.1.102: 


# /sbin/iptables -t nat -A POSTROUTING -p -s 10.0.0.0/24 \ 
-d 10.0.1.1 tcp -dport 80 -j DNAT --to 10.0.1.100-10.0.1.102:8080 


3. Enhanced TCP monitoring — To check all six TCP flags and 
check that the SYN and ACK flags are set: 


#/sbin/ iptables -A INPUT -p tcp --tcp-flags ALL SYN,ACK -j DROP 


To deny outgoing connections to insecure telnet, FTP, and rsh 
services using a single command: 


# /sbin/iptables -A input -t DENY -p tcp --destport telnet, ftp,shell 


Note that in ipchains, each port would have required its own sep- 
arate rule. 
4. Filtering by MAC address — The rule: 


# /sbin/iptables -A FORWARD -m state --state \ 
NEW -m mac --mac-source 00:C7:8F:72:14 -j ACCEPT 
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allows only outgoing packets from a known MAC address, given 
in colon-separated hex notation. 
5. Enhanced logging: 


# /sbin/iptables -A INPUT -s 192.168.0.1 -m limit -limit \ 
1/second -j LOG 


limits the rate of writes to the logs to one per second. Specific 
matches can be labeled. For example, log entries corresponding 
to connect requests from the Litigation Department’s Cisco 1601 
router can be labeled: 


# /sbin/iptables -A INPUT -s 192.168.5.254 \ 
-j LOG --log-prefix ' #f Litigation Dept Cisco 1601 Hf ' 


The logfile entry looks like: 


Aug 1 14:58:39 mymachine kernel: # Litigation Dept Cisco 1601 \ 
dHE IN=ethO OUT= MAC=00: f0:28:2c:69:67:00:00:7a:93:5e:62:08:00 \ 
SRC=192.168.5.254 DST=192.168.1.254 LEN=40 TOS=0x00 PREC=0x00 \ 
TTL=247 1D=21864 DF PROTO=TCP SPT=42300 DPT=23 WINDOW=8760 RES=0x0 

0 RST URGP=0 


6. Rate-Limited matching — To protect from syn-flood denial of 
service, only accept a maximum of | per second: 


#/sbin/iptables -A FORWARD -p tcp -syn -m limit -limit 1/s -j ACCEPT 


7. Type of Service (TOS) prioritization — To maximize ssh 
response while maintaining maximum file data transfer over 
HTTP connections, the following rule can be applied: 


# /sbin/iptables -A PREROUTING -t mangle -p tcp -- 
-j TOS --set-tos Minimize-Delay 

# /sbin/iptables -A PREROUTING -t mangle -p tcp -- 
-j TOS --set-tos Maximize-Throughput 


sport ssh \ 


sport http \ 


Conclusion 

This article explains some new firewalling features of 
IPTables that administrators of ipchains-based Linux firewalls 
may find useful. My aim was also to provide a starting point for 
administrators who are familiar with ipchains based-firewalling 
and are considering a move to IPTables/NetFilter. Much of the 
hoopla surrounding the Linux 2.4 kernel has revolved around 
IPTables support for stateful packet inspection. However, I hope 
that this article has shown that there is more to IPTables than 
merely stateful inspection. IPTables also can provide firewalling 
scripts that are cleaner and easier to read, and easier to maintain. 
It seems that with its many powerful new features, Open Source 
Linux firewalling has finally come of age. 
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uncommon, yet are useful in the quest for heightened network 

security. These features provide the ability to scrutinize net- 
work traffic more critically in the case of Reflexive Access Control 
Lists, or even alter the path the network sensitive data takes using 
Policy Based Routing. TCP Intercept and Unicast Reverse Path 
Forwarding help eliminate damaging and security threatening traffic 
at the edge of the network by validating inbound connection attempts 
and their sources. The following sections describe some of these use- 
ful and under utilized Cisco security features. 


Reflexive ACLs 


Typically, Access Control Lists (ACLs) are statically configured 
to drop or permit network packets. This can make securing the net- 
work difficult as return traffic for connections initiated from inside 
the network must explicitly be allowed. With reflexive ACLs, the 
router will automatically create entries in a dynamic ACL, which can 
then be evaluated to authorize return traffic. As these dynamic ACL 
entries are automatically created for a specific connection and spec- 
ify the IP protocol, source and destination addresses, and ports of the 
connection, they improve network security by only allowing inbound 
traffic that is directly associated with a connection initiated from the 
inside of the network. Without this functionality, security is difficult 
to achieve using the only alternative of allowing the return traffic for 
all possible outbound connections. 

Reflexive ACLs increase the power of IP extended named access 
control lists by including two additional keywords: 
“reflect” and “evaluate”. The reflect keyword is used 
to updated a dynamic ACL with the mirror image of 
the packet matching the ACL entry. Return traffic is 
later checked against this dynamic ACL using the 
evaluate keyword. Below is an example of the out- 
bound and inbound IP named extended ACLs required 
to allow a DNS client to talk to a DNS server without 
using Reflexive ACLs: 


[Te are many security features of Cisco’s IOS that may be 


ip access-list extended OUTBOUND 
permit udp any any eq domain 

ip access-list extended INBOUND 
permit udp any eq domain any 


While this may look benign enough, it will allow 
any outside computer sending UDP traffic sourced on 
port 53 to communicate with any computer inside 
your network via UDP. If we extend our example to 
include the “reflect” and “evaluate” commands, we 
end up with a more secure configuration: 


ip access-list extended OUTBOUND 

permit udp any any eq domain reflect REFLEXIVE 
ip access-list extended INBOUND 

evaluate REFLEXIVE 
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CONFIGURATION 


In this example, the reflexive ACL is dynamically changed to allow the 
return DNS traffic. For example, if we are speaking with a DNS server at 
IP address 172.16.1.1 from a client at 192.168.1.1, the reflexive ACL is 
modified, essentially adding the line “permit udp host 172.16.1.1 eq 
domain host 192.168.1.1 eq 1024” to the inbound ACL. 

There are two circumstances that result in the removal of an entry 
from a reflexive ACL. These are the end of a TCP session via the 
FIN or RST bits, or the lack of traffic in a TCP or UDP session. The 
inactivity timeout can be set globally through the ip reflexive-list 
timeout global configuration command, or for an individual reflexive 
ACL via the “timeout” option during creation. Here is a portion of the 
previous example extended to include the timeout option: 


ip access-list extended OUTBOUND 
permit udp any any eq domain reflect REFLEXIVE timeout 120 


After a reflexive ACL has been configured, you can view the 
dynamic contents of the ACL with the show access-list privileged 
EXEC command. This command will show the number of packets 
that have matched each entry, as well as the time in seconds until the 
entry will be removed if it remains inactive. 


Policy Based Routing 

Although it has many uses, Policy Based Routing (PBR) can be 
used to direct sensitive network data over paths with higher security. 
Possibilities include sending telnet traffic over an IPSec encrypted 
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tunnel or transmitting critical application data over dedicated 
interfaces. Within a security context and when altering the next 
hop of an IP packet, PBR configurations are router-centric. This 
requires that each router along the traffic path from source to des- 
tination be manually configured if a policy-based action is required. 

There are several components to a PBR implementation. First, 
access lists are defined to determine to which traffic a policy is 
applied. IP standard and IP extended access lists can be created with 
the access-list global configuration command. IP standard access 
lists only test the IP source address field and should not be used in 
favor of the more flexible IP extended and named IP extended 
access list types. Named IP extended access lists provide improved 
editing capabilities and the ability to add inline comments. The 
global configuration command ip access-list extended is used 
to create named IP extended access lists. 

Route maps are then created to pair packet conditions with setting 
changes. The route map essentially takes a packet matching an access 
list and applies one or more alterations to the packet. For example, 
packets between two hosts can have their next hop altered so that they 
are sent over a slower but more secure link regardless of what has 
been learned via a routing protocol or configured via static routes. As 
it is only possible to apply a single route map to an interface, it is pos- 
sible to create multiple instances of a route map. Each instance of a 
route map can apply different settings to different traffic. Instances 
are applied to traffic in order using sequence numbers, from low to 
high. A route map instance is created with the route-map global 
configuration command, which has the following syntax: 


route-map NAME ACTION SEQNUM 
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For use in PBR, the action field, shown above as ACTION, should 
always be permit. This tells PBR to apply the settings of the route 
map to the packets matching the conditions of the route map. The 
access list is linked to the route map instance with the match 
address route map configuration command as shown below: 


route-map NAME ACTION SEQNUM 
match address ACL 


It is possible to effect several changes on a packet or traffic flow 
using policy-based routing. Most often used within a security con- 
text, the ip next-hop alters the path of a traffic flow. Below is the 
complete syntax to create a route map instance that changes the next 
hop of a packet: 


route-map NAME ACTION SEQNUM 
match address ACL 
set ip next-hop NEXTHOP 


Once the access list and route maps have been created, they are 
then applied to an interface with the ip policy route-map inter- 
face configuration command. Policy routing will look at all traffic 
arriving as input on an interface and does not look at traffic to be 
transmitted out of an interface. The following sample configuration 
will send all telnet traffic to host 192.168.4.100 over a slower point- 
to-point serial line instead of the faster frame relay link: 


interface Ethernetl 

ip address 192.168.1.1 255.255.255.0 
ip policy route-map ROUTE-MAP 
interface Serial2 
description Tl to Frame Relay 

ip address 192.168.2.1 255.255.255.252 
encapsulation frame-relay 
interface Serial3 
description 56k to Site B, remote router 192.168.3.2 
ip address 192.168.3.1 255.255.255.252 


! 
ip access-list extended ROUTE-MAP-ACL 

permit tcp any host 192.168.4.100 eq 23 
i} 


route-map ROUTE-MAP permit 10 
match address ROUTE-MAP-ACL 
set ip next-hop 192.168.3.2 


Although we did not need it for this example, it is possible to 
expand the ACL to include many types of traffic and settings using 
multiple instances of the same route map. In addition to the secu- 
rity-related next hop change in our last example, this example also 
directs Web traffic to a Web cache device: 


ip access-list extended ROUTE-MAP-ACL 
permit tcp any host 192.168.4.100 eq 23 
! 
ip access-list extended WEB-CACHE-ACL 
deny tcp 192.168.1.100 any eq 80 
permit tcp any host any eq 80 

! 
route-map ROUTE-MAP permit 10 
match address ROUTE-MAP-ACL 
set ip next-hop 192.168.3.2 
| 
route-map ROUTE-MAP permit 20 
match address WEB-CACHE-ACL 
set ip next-hop 192.168.1.100 


While it is possible to have multiple entries in access lists and 
multiple route map instances, only a single route map may be 
applied to an interface. 

To improve the performance of policy routed packets, the ip 
route-cache policy interface configuration command should be 
used to enable fast switching on interfaces with policy routing 


December 2001 


When altacks are hammering. io eli is trac i have an 


intrusion detection system that can Keep up. 


NFR's NID-200 appliance, with its high performance software and 
dual processor hardware, won't miss a beat. 


NFR Security is the leader in enterprise intrusion 
detection with a reputation for providing the most 
powerful and thorough monitoring facilities for 
detecting attacks and suspicious activity. 


Combine this with installation and management 
capabilities that are deceptively simple to use, 
unrivalled flexibility that allows you to customize as 
much or as little as you need and an operating system 
that is tamper-proof - and you vull understand why so 
many organizations rely on us for their security. 


© 2001 NER Security, Inc. All Rights Reserved 


Ney Bc Sopris: | omit a Saline co oaenareet ans 


capa 


INIr-F 
SECURITY 


For further information or to request 
FREE evaluation software, contact us. 
E-mail: sales @ nfr.com 

Phone: +1 240 632 9000 

Fax: +1 240 632 0200 


www.nfr.com 


enabled. Fast switching utilizes a route cache to greatly improve 
router performance and permits packets by the router without the 
interruption of the main CPU. To verify that policy-routed packets 
are fast switched, use the show ip interface EXEC command and 
look for the line that lists the IP route-cache flags. If this line 
contains the keyword “Policy”, fast switching of policy-routed 
packets is enabled on that interface. 


TCP Intercept 

With TCP Intercept, a router can reduce the effect of SYN flood 
attacks on an organization’s servers. Specifically, TCP Intercept 
monitors TCP connections matching an access list and will timeout 
half-open sessions in a much more aggressive fashion than most 
server operating systems. 

It is possible to configure TCP Intercept in one of two modes. 
When configured for the first and default mode, the router will 
establish connections on the server’s behalf with outside clients. 
Once the connection has been successfully established between the 
router and client, the router will open a session to the server and 
connect the two sessions. This operation is transparent to both the 
client and server, but does require more router resources. This mode 
is known as “intercept mode”. 

In the other mode, called “watch mode”, the router passively 
monitors TCP connections to the servers. Under normal circum- 
stances, the router has no part in successful connections. If, how- 
ever, a connection fails to reach ESTABLISHED state before a 
timer expires, the router sends a TCP reset to the server to close the 
connection. This timer can be changed with the ip tcp intercept 
watch-timeout global configuration command and defaults to 30 
seconds. The TCP Intercept mode is set with the global configura- 
tion command ip tcp intercept mode. 

If the server has 1100 incomplete connections at any point, or 
if there have been 1100 connection attempts in the last minute, 
TCP Intercept with default settings will behave much more 
aggressively. While in this state, each new connection attempt 
causes an existing half-open connection to be dropped. By 
default, the router will drop the oldest incomplete connection, or 
alternatively can be configured to drop a random connection 
using the ip tcp intercept drop-mode global configuration 
command. If TCP Intercept is configured for watch mode opera- 
tion, the timeout for half-open connections is cut in half (to 15 
seconds by default) when entering aggressive mode. 

The point at which TCP Intercept begins acting more aggres- 
sively, and the duration of aggressive mode, can be tuned with the 
following four global configuration commands: 


ip tcp intercept max-incomplete high VALUE 


When the number of incomplete connections reaches this value, 
TCP Intercept enters aggressive mode. The default setting for 
VALUE is 1100. 


ip tcp intercept max-incomplete low VALUE 


This global configuration command tells the router at what point 
it should return to normal operation from aggressive mode. The 
default setting for VALUE is 900. 


ip tcp intercept one-minute high VALUE 


When VALUE number of connection attempts have been received 
in the last minute, TCP Intercept will begin aggressive mode 
processing. The default setting for VALUE is 1100. 


ip tcp intercept one-minute low VALUE 
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The router will continue to operate in aggressive mode until the 
number of active connections drops below this value. The default 
setting for VALUE is 900. 

If both of the low values are exceeded, the router will enter 
aggressive mode and start dropping half-open connections. It 
remains in aggressive mode until both numbers drop below the low 
value thresholds. 

Here is a complete sample configuration utilizing default values 
and showing the minimum configuration required: 


! 
jp tcp intercept list 100 
! 


access-list 100 permit any 192.168.1.0 0.0.0.255 
! 


Below is a more extensive example that changes the intercept mode, 
drop mode, and some timeout values: 


| 


ip tcp intercept list 100 

ip tcp intercept mode watch 

ip tcp intercept drop-mode random 

ip tcp intercept one-minute high 2000 
ip tcp intercept one-minute low 1600 
! 


access-list 100 permit any 192.168.1.0 0.0.0.255 eq www 
access-list 100 permit any 192.168.1.0 0.0.0.255 eq smtp 
! 


There are only two show commands that can provide insight 
into the operation of a TCP Intercept configuration. The first, 
show tcp intercept connections, will show the active complete 
and incomplete TCP sessions. It lists the state of these connections 
as well as counters since the connection was created, and until it will 
time out. The second command, show tcp intercept statistics, 
lists the number of complete and incomplete sessions, as well as 
the one-minute connection rate. This information will give you 
an important head start when beginning to tune your TCP 
Intercept configuration. 


Unicast Reverse Path Forwarding 

With Unicast Reverse Path Forwarding (Unicast RPF), you 
can eliminate spoofed IP packets at the network edge. Unicast 
RPF inspects each input packet on enabled interfaces and verifies 
that the router received the packet over the correct interface using 
the local routing table. While in some situations this could be 
accomplished more simply with access lists, Unicast RPF will 
guarantee that packets arriving on the border router’s outside 
interface do not claim to be sourced by internal clients. 
Furthermore, ACLs require manual editing and could become 
tedious and error prone in dynamic environments or where 
dynamic routing protocols are used between business partners or 
large remote sites. 

A typical and appropriate scenario for Unicast RPF would be 
for organizations with connections to external business partners 
or possibly remote locations with untrusted users. Applying 
Unicast RPF to router interfaces connecting these types of locations 
can make sure that they are not the source of spoofed network 
packets and certain types of denial of service attacks. Internet 
service providers can also improve the security of their own net- 
work and the Internet in general by implementing Unicast RPF 
on the connections to their customers. 

Since there is a certain level of traffic asymmetry with multi- 
homed Internet connections, it is not possible to ensure that all 
traffic will take the same path into and out of the network. For 
this reason, Unicast RPF is not suitable for the border router or 
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routers of organizations with multiple Internet connections. 
Implementing Unicast RPF in this setting could result in the 
blocking of valid traffic. 

Available in Cisco 12.0 and later, Unicast RPF is only available 
on platforms that support Cisco Express Forwarding. Configuring 
Unicast RPF is straightforward and is enabled by simply entering 
the command ip verify unicast reverse-path in interface 
configuration mode. When configured on interfaces with ACLs 
defined, input ACLs are checked before Unicast RPF processes 
the packet. 


Restricting Telnet Access to Routers 

Only a relatively small number of workstations require the abil- 
ity to telnet to routers within the network in most environments. The 
access-class line configuration command gives us the ability to 
control exactly which workstations can access the routers, or alter- 
natively, which workstations cannot. Even if this does not seem 
appropriate for an organization’s internal routers, it should be con- 
figured on routers available from the Internet. The access-class 
command takes an argument specifying the ACL to use when 
permitting or denying inbound telnet connections: 


access-list 90 permit 1.1.1.0 0.0.0.255 
| 


line vty 0 4 
access-class 90 in 


The above example will only allow devices with source IP 
addresses of 1.1.1.0 to 1.1.1.255 to telnet to this router. 
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Disallowing Directed Broadcasts 

Used in many of the prevalent denial of service attacks, directed 
broadcasts allow a remote computer to send a broadcast to all of the 
computers on a subnet. Until version 12.0, Cisco routers would by 
default forward and convert these broadcasts to MAC-level broad- 
casts once they reached the destination subnet. In all versions of 
IOS, the interface configuration command ip directed-broadcast 
enables this functionality. 

Providing further functionality and control, the ip forward- 
protocol global configuration command customizes which broad- 
casts are allowed and converted. This command has a wide list of 
supported protocols, which includes tftp, bootp, and netbios-ns. 

The ip directed-broadcast command takes an optional 
argument specifying an access list used to filter the broadcasts. 
This provides additional granularity when specifying which broad- 
casts are forwarded. In the following example, we permit only 
broadcasts sourced within our network, which uses the 192.168.0.0 
private address space: 


interface Ethernet0 
ip directed-broadcast 100 
! 


access-list 100 permit ip 192.168.0.0 0.0.255.255 any 


Conclusion 

Network security is an elusive target and is considered by many 
to be unattainable. Although these are not all of the options, they are 
tools that when used correctly can improve network security. 
Tim Sammut, CCIE #6642, is a Sr. Network Engineer for Logicon FDC and provides 


network design and implementation services to the federal government in the Maryland 
and Washington DC areas. He can be contacted at: tim. sammut@feddata.com. 
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Parsing Interesting Things 


Randal L. Schwartz 


omeone recently popped into one of 
the newsgroups I frequent and asked 
how to parse an INI file. You might 


have seen those before, with sections and 
keyword=value lines, like: 


[login] 
timeout=30 
remote=yes 


[password] 
minlength=6 


I think they started in the Microsoft 
world, since no sane UNIX hacker would 
have come up with something like that. No, we come up with things 
like .Xdefaults and sendmail.cf and termcap. But the request 
seemed simple: parse the file and gather the information into a hash 
for quick access, two levels deep, of course. 

Now, I usually carry the banner here for “use the CPAN”, and in 
fact, there are numerous CPAN modules that parse INI files (too 
many, I think). But let’s take a different route here. Suppose we 
were parsing a file that wasn’t already CPANned to death. What 
tools could we use? 

Well, certainly Perl’s regular expressions are pretty powerful in 
the first place, and this task really wouldn’t be that difficult with 
hand-written code, but we can go a bit further and pull out a nifty 
tool from the CPAN: the “madman of Perl” Damian Conway’s 
Parse::RecDescent. This module permits extremely complex 
parsers to be built by specifying a nice hierarchical description of 
the data (as a grammar), and a series of actions to be taken as each 
portion of the data is returned. I find it very simple to use, and 
whipped up a parser in no time. 

The key to a useful grammar is getting the description right, and 
what to do once you’ve seen that. First, let’s look at a file. A file is a 
series of sections, so in the grammar language, that’s given as: 


file: sections 


Actually, a file is a bit more than that. If we just used that, the 
grammar would match any prefix of the input that also had sections. 
So, we need to anchor that: 


file: sections /\z/ 


Which says, match sections, and when you’re done matching 
sections, match the end of the string. If you’re not at the end of 
the string when you are done matching sections, this isn’t a file 
that we want. 

And now, sections is zero or more sections, which we write as: 


sections: section(s?) 
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with the (S?) suffix meaning “zero or more”. 
Very readable so far. A section is a section 
marker (the square-bracket line) and some 
definitions: 


section: section_marker definitions 
definitions: definition(s?) 


And we’ve defined the definitions as well. 
So far, we’ve managed to capture the 
essence of an INI-like file, but we’ve not 
actually matched anything (except the end 
of string). That’s because we’ve been con- 
structing “non-terminals”. Grammar rules 
can also contain “terminals” (like the end-of-string token 
above) to define specific things to match. Let’s start with a sec- 
tion marker: 


section_marker: /\[.*\]/ 


There. A section marker is a square-bracketed thingy. And what’s a 
definition? 


definition: key /=/ value 


Yeah, it’s a key and a value, separated by an equals. But what are 
those? Why, more terminals! 


key: /\wt/ 
value: /.*/ 


And already with just a few lines of code, we’ve defined most of the 
grammar. But now we need to introduce a bit more knowledge 
about Parse: :RecDescent. Between each of the items of the rules, 
the generated parser will be permitted to skip over the current skip 
string, which is “whitespace” by default. This is fine for section 
markers: we don’t mind any preceding whitespace being tossed. But 
it’s a pain if whitespace gets in-between the key and the rest of the 
line. Fortunately, we can define that the skip string be altered for the 
remainder of a rule: 


definition: key <skip: ’’> /=/ value 


which means that the string ’’ (the empty string) is now the skip 
string, meaning that the equals must be adjacent to the end of the 
key, and the value starts immediately after the equals. Good! 

We could stick all the rules above into a string $GRAMMAR, and 
then create a parser $PARSER using these rules as: 


use Parse::RecDescent; 


my $PARSER = Parse: :RecDescent->new($GRAMMAR) 
or die; 
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This $PARSER can then be used repeatedly to see whether a file fits 
the specifications. To do that, we call the top-level rule (file) as a 
method, passing it $INPUT, the contents of the file in question: 


if (defined(my $result = $PARSER->file($INPUT))) { 
print "It's a valid INI file!\n"; 

} else { 
print "No good.\n"; 

} 


Now, if all we were doing was verifying well-formedness, that’s 
enough. But we wanted to also use the data as it was parsed. To do 
that, we need to also know that every rule is like a subroutine call, 
and passes back the last value evaluated. By default, that’s the string 
matching the terminal (or $1 if it’s included), or whatever value the 
last subrule returns. (For the repetitions above, an arrayref is 
returned of all the matches, if any.) However, we can include some 
Perl code enclosed in a block as the last rule, and then that will be 
the return value. 

For example, we really don’t want the brackets included in the 
section marker, so we can select (using $1) them away: 


section_marker: /\[(.*)\]/ 


There. Now the brackets are not part of the return value. If we didn’t 
know that $1 is automatically returned, we could return it explicitly: 


section_marker: /\[(.*)\]/ { $1 } 


which says to perform the regex match, and if it succeeds, evaluate 
the block. As long as the block doesn’t return undef, it’s also con- 
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sidered a “match”, and as the last thing in a rule, it’s also the overall 
value of the rule. 

But what about the definitions? We want to note both the key 
and the value, so we’ll use some sort of Perl block at the end of the 
rule. And we can return an arrayref of the two items just fine, but we 
need to access the “value” of the key and value subrules through the 
magical %item hash. The keys to this hash are the names of the sub- 
tules. (Sorry for the overloading of the key/value terms here.) 


definition: key <skip: ''> /=/ value 
{ [$item{key}, $item{value}] } 


And now a definition is an arrayref, consisting of the found key, 
and its found value. (If there’s more than one item called “key”, 
then you must resort to positional syntax, but it’s almost always 
easier and clearer to just invent a new non-terminal name for that 
particular slot.) 

Similarly, a section needs the name of the section and all of the 
definitions of that section. 


section: section_marker definitions 
{ [$item{section_marker}, $item{definitions}] } 


Note that definitions will already be an arrayref of individual def- 
initions, which are themselves references to two-element arrays. All 
this stacking is taken care of automatically by the parser built by 
Parse: :RecDescent! 

Finally, the fun part. A file wants to be all the sections. And we 
could just punt and return that: 


file: sections /\z/ { $item{sections} } 
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which will then be an arrayref pointing to a list of sections, each 
section being an arrayref pointing to a list of definitions in that 
section, each definition being an arrayref pointing to a key/value 
tuple. But let’s convert this into a hash for quick access: 


file: 
sections /\z/ 
my %return; 
my $sections = $item{sections}; 
for my $section (@$sections) { 
my ($section_marker, $definitions) = @$section; 
for my $definition (@$definitions) { 
my ($key, $value) = @$definition; 
for ($return{$section_marker}{$key}) { 
if (not defined $_) { 
$_ = $value; 
elsif (not ref $_) { 
= [$_, $value]; 
else { 


push @$_, $value; 


} 
\return; 


} 


Wow. What was that? Well, first we define a hash to be returned 
(as a hashref), and then walk the multiple levels of the arrayrefs 
of arrayrefs of tuples. The interesting part starts in the middle, 
which is merely aliasing $return{$section_marker} {$key} to 
$_ for the rest of the inner loop. If that value isn’t defined, then 
this is the first time we’ve seen a keyword under a given section, 
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so we stuff the value. If it’s already defined, then we’ve seen the 
same keyword twice. In this case, I decided to turn the value into 
an arrayref, so that the values are individually extractable. And 
finally, if it’s already an arrayref, then we just push the latest hit 
onto the end. 

The return value of calling the file method is now either this 
hashref, or undef. So to get the “timeout” parameter from the 
example INI file above, we’d say: 


my $timeout = $result->{login} {timeout}; 


Because the names are case sensitive, we might want to add a few 
other things to force all the section names and keys to lowercase, or 
perhaps we could do that while we were building the hash. 

There you have it: an INI-like file parser made with 
Parse: :RecDescent. I hope this brief intro to this powerful module 
will get you interested enough to read the rest of the documentation 
and study its amazing array of features. And you’ll never fear pars- 
ing an odd-looking file again. Until next time, enjoy! 

Randal L. Schwartz is a two-decade veteran of the software industry — skilled in software 
design, system administration, security, technical writing, and training. He has coau- 
thored the “must-have” standards: Programming Perl, Learning Perl, Learning Perl for 
Win32 Systems, and Effective Perl Programming, as well as writing regular columns for 
WebTechniques and Unix Review magazines. He’s also a frequent contributor to the Perl 


newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 
1985, Randal has owned and operated Stonehenge Consulting Services, Inc. 
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SECURITY 


Implementing Kerberos 


David Smith 


erberos is the mythical three-headed dog guarding the gates 

of the underworld. It has long been a symbol of unceasing 

vigilance and was therefore selected as the name of the 
authentication service for MIT’s Project Athena (a collaboration 
among IBM, DEC, and MIT to develop a ubiquitous networking 
service for undergraduates at MIT). Project Athena included such 
innovations as the X Window system, a Directory service named 
Hesiod, and a messaging service named Zephyr, but only X and 
Kerberos have become prevalent outside MIT. 

Kerberos was designed for an environment with insecure work- 
stations, an insecure network, and moderately secure servers. These 
assumptions are paralleled in the current Internet, making Kerberos 
an effective authentication service in modern distributed environ- 
ments. It is not, however, a complete security service: Kerberos pro- 
vides for authentication, not authorization or accounting, and is not 
easily extensible to multi-domain environments. The latter issue is 
being addressed by current work incorporating PKI support into 
Kerberos, but at present cross-realm authentication needs to be per- 
formed by explicitly extending trust between two Key Distribution 
Centers (KDC). 

Enhancements in Kerberos V allow hierarchical cross-realm 
trust using transitive trust between different domains. With this 
architecture, a KDC in realm PLEAIDES.TAURUS.TEST can have 
a cross-realm trust extended to TAURUS.TEST and will inherit 
trust to other subdomains of TAURUS.TEST. 
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In this article, I will demonstrate how to implement 
Kerberos on multiple operating systems and discuss 
some of the issues that arise when using “kerberized” 
applications. I will begin with a discussion of the ter- 
minology and roles involved in a Kerberos system, 
following with the network design for the test domain 
and the actual implementation steps. For this article, I 
am using the Kerberos Version 5 distribution obtained 
from MIT: 


http://web.mit.edu/network/kerberos- form. html 


This site provides binaries and sources for destina- 
tions in the United States and Canada; exportable 
versions for use by individuals in other countries 
are available at: 


http://www.crypto-publish.org 


Protocol 

Kerberos is a method to distribute session keys for 
use in encrypting traffic between two entities in a net- 
work. These session keys include a timestamp so that 
they are valid for a limited duration (preventing replay 
attacks), and are provided by a Key Distribution 
Center (KDC). The KDC must be implemented on a 
secure platform, because it holds the identities of all 
parties involved in the Kerberos system. The KDC contains two ser- 
vices: an Authentication Service that handles the initial request and 
a Ticket Granting Service that makes use of an encrypted channel 
for requests to other services. It should not be used for other ser- 
vices because of the danger that those services might compromise 
the Kerberos database. If we use the example of one party, Alice 
attempting to access a service on the system named Bob, the proto- 
col takes the following steps: 


1 Alice requests access to the Ticket Granting Server (TGS) from 
the KDC in plaintext. 

2 The KDC sends a session key for the TGS (encrypted in Alice’s 
secret key). 

3. Alice uses the session key to get a Ticket Granting Ticket (TGT) 
from the TGS and subsequently uses that session key and the 
TGT to obtain tickets for new services. This prevents her from 
having to continually send traffic encrypted in her secret key 
(which gives more data to brute-force decoding attack). 

4. When Alice requests a service on Bob, she sends a request to the 
TGS encrypted with her session key that contains her TGT, a 
newly constructed Authenticator to identify her, and the service 
identifier for Bob. 

5. The TGS returns a ticket valid for use with Bob and also a new 
session key to use when communicating with Bob, encrypted in 
the session key for the TGS. 
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6. Alice sends the Ticket (which includes the specific service 
requested) and her Authenticator for that ticket to Bob. 

7. Bob decrypts the ticket (which is encrypted with Bob’s secret 
key), and extracts the session key for the session with Alice. 
Bob uses the session key to decrypt the authenticator received 
from Alice. 


The Kerberos protocol ends at this point, with Alice and Bob 
mutually authenticated (since only the true Bob could decrypt the 
Ticket), and with a randomly generated session key they can use to 
encrypt further traffic for the service. Each service needs to have its 
own session key, although different applications might use the same 
principals to create their unique session keys. 


www.plustechnologies.com 


This protocol design has several implications. First, the KDC 
must be kept secure and highly available. Kerberos includes the 
concept of master/slave KDCs so that there is no single point of 
failure. Each service that uses Kerberos must be “kerberized” so 
that the initial connection makes use of the TGT to authenticate and 
provide the session keys to encrypt traffic. Thus, Kerberos cannot 
be used transparently to a non-kerberized application (such as can 
be done with SSH). As a growing number of applications are ker- 
berized, however, this issue is becoming less important. 


Implementation Steps 

Kerberos implementation involves several steps. Before 
installing the software it is necessary to design the Kerberos 
realm, the server locations, and ancillary 
services to keep the KDC functional. 
Then, the applications to be kerberized 
must be identified and integrated with 
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For Kerberos to prevent replay attacks, 
each ticket and authenticator is time- 
stamped. Therefore, all principals in a 
Kerberos realm must have a consistent 
value for the current time. This requires a 
time service available to all hosts in the 
realm (and consistent with the time used in 
other realms). I have used NTP for this pur- 
pose and installed NTP clients on all the 
systems with a local NTP server synchro- 
nized with the NTP network. 


Realm Design 

Kerberos is divided into realms, each 
a self-contained implementation of a 
KDC (including an Authentication ser- 
vice and a TGS on the same system) with 
a set of registered principals. Originally, 
realm lookup was performed by defining 
the realm/DNS domain correspondence 
in a configuration file. But in Kerberos 
V, it is possible to use new DNS SRV 
records to locate Kerberos realms and 
principles using DNS. Kerberos V 
defaults to looking up KDCs through 
DNS, but requires explicit configuration 
to look up realms from DNS names. MIT 
states that the latter is more prone to 
spoofing and does not recommend it for 
sites accessible from the Internet. 
However, both realm and KDC lookups 
through DNS were used by Microsoft in 
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their implementation for Windows 2000. 
Therefore, both will be used in this arti- 
cle to transparently integrate Windows 
2000 clients into a Kerberos realm. 

Each realm is named with a unique 
name, by convention an uppercase version 
of the DNS domain. For example, in the 
domain taurus.test, the Kerberos realm 
should be named TAURUS.TEST. If multi- 
ple realms are required in an organization, 
use descriptive names ending with the 
domain name, such as PLEAIDES.TAU- 
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RUS.TEST. The names of realms and the KDCs for each realm can 
be defined in a configuration file on each system, or provided by 
special DNS records. Although realms are not hierarchical (where a 
realm named PLEAIDES.TAURUS.TEST would be a subsidiary of 
the TAURUS.TEST realm), the integration with DNS allows for 
realm lookups to be made through the DNS hierarchy. Kerberos V 
also allows transitive trust to be created between subdomains and 
their parent zones, extending trust to other subdomains of a com- 
mon parent. 

Each client will look for a TXT record in the DNS records for a 
target host. If there is a record in the host name or in any zone con- 
taining that host with the first component of “_kerberos”’, then the 
rest of the TXT record is taken as the name of the realm. This allows 
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multiple domains to be managed as a single realm, or separate hosts 
within a domain to be managed in separate realms. 

Each realm then has a DNS style name; and, in the zone records 
for that domain, special SRV records identify the master and slave 
KDCs and the Kerberos admin server. These services are defined as 
follows: 


¢ _kerberos._udp — This identifies the Kerberos service on any 
KDC, both master and slave servers. Only active servers should 
be included in this list; if the Kerberos daemon is not running, the 
server name should be removed. 

¢ _kerberos-master._udp — This identifies Kerberos service on 
the master KDC, so that a client can make a final decision on 
authenticating a user where they may have changed their pass- 
word but not had enough time for the 
change to propagate. 

e _kerberos-adm._tcp — This identifies a 
server running the kadmind program. 
Only the master KDC runs this program, 
which is used to modify the Kerberos 
database. At present, the kadmin pro- 
gram still needs to look up the admin 
server using the krb5.conf file, but other 
applications such as Windows 2000 
expect to use the DNS records. 

e _kpasswd._udp — This identifies the 
KDC where password changes can be 
recorded. 


MIT recommends that the KDCs be 
named with aliases that can be used in the 
krb5.conf files as well. These aliases should 
be kerberos for the master KDC and ker- 
beros-1, kerberos-2 for all slave KDCs. 
These aliases are not used in the DNS imple- 
mentation, which must use the true name of 
the host in the SRV record. Therefore, while 
the use of “kerberos” aliases should be con- 
tinued to support transparent failover for 
Kerberos IV clients, both the aliases and the 
canonical names must be registered as princi- 
pals in the Kerberos realm. 

For this project, I configured a realm 
named TAURUS.TEST with the master 
KDC named aldebaran.taurus.test, a slave 
KDC (also supporting SMB) named 
ain.taurus.test, and several clients in the 
domain pleaides.taurus.test. Aldebaran is 
a 100-MHz Pentium running BSD/OS 
Version 3.0, ain is an 350-MHz Intel 
Pentium III, while the client systems 
range from a Motorola StarMax 3000 run- 
ning Mac OS 9.1 to a 500-MHz Pentium 
Il running Windows 2000. 

The DNS zone for taurus.test therefore 
includes the following records: 


$TTL 86400 
@ SOA ain.taurus.test. root.ain.taurus.test. ( 
2001080805 ; serial 
10800 ; refresh 
3600 > retry 
3600000 ; expire 
86400 ) ; minimum 
eG HEWLETT® 
EB PACKARD NS ain.taurus.test. 
Channel Partner 
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ain A 192.168.1.1 
kerberos-1 CNAME ain 


hyadum A 192.168.1.2 
alcyone A 192.168.1.10 
taygeta A 192.168.1.41 
aldebaran A 192.168.1.25 


kerberos CNAME aldebaran 


electra.pleaides A 192.168.1.30 
merope.pleaides A 192.468.4435 


_kerberos TXT "TAURUS. TEST" 
_kerberos._udp SRV 0 0 88 aldebaran 
_kerberos-master._udp SRV 0 0 88 aldebaran 
_kerberos-adm._tcp SRV 0 0 749 aldebaran 
_kpasswd._udp SRV 0 0 464 aldebaran 


Source Configuration and Installation 

I downloaded the Kerberos V 1.2.2 source from MIT into 
/usr/local/src on aldebaran. The source for 1.2.2 comes in one 
gzipped tar file that unpacks into the krb5-1.2.2 directory tree. This 
tree includes the doc and src subtrees. 

For cross-platform builds, Kerberos supports an object tree so 
that one source tree can be used by multiple systems. The 
Kerberos source includes the client utilities required to communi- 
cate to the KDC and the libraries needed by kerberized applica- 
tions. Support for Kerberos IV clients is normally provided 
(except by Windows 2000) so that older clients can use a 
Kerberos V server for authentication. Because of the number of 
different operating systems in this test network, I exported the 
Kerberos source tree so that executables for each system could be 
built from a centralized resource. 

Most operating systems (including OpenBSD) ship with 
Kerberos IV, not Kerberos V. Kerberos IV has known buffer 
overflow vulnerabilities and is no longer supported by MIT. 
Therefore, I obtained the latest source from MIT. Even with a 
current version of the operating system, I recommend obtaining 
the source from MIT to stay up to date with patches and bug 


Glossary 


Authentication Service — A service on a KDC that verifies 
principals and issues tickets for their services. 

Key Distribution Center — A system that maintains the data- 
base of principals in the Kerberos realm. It returns tickets for 
use between authenticated principals based on requests from 
one principal. Generally a KDC runs two Kerberos services: 
the Authentication service and the Ticket Granting Service. 

principal — An entity within the Kerberos system. 
Principals are identified by a three-part name such as 
primary/instance@REALM. Principals can be people (in which 
case no instance is specified), or Kerberized services defined by 
the primary and located on an instance ina REALM. 

realm — A network of Kerberos principals maintained in a sin- 
gle database. A realm is identified by an uppercase name 
matching a DNS zone. 

Ticket — A data record containing the name of principal A 
requesting access from principal B, encrypted in principal 
B’s secret key. 

Ticket Granting Service — A service on a KDC that issues ses- 
sion keys for use by other principals. 

Ticket Granting Ticket — A ticket toa TGS that allows a prin- 
cipal to request access to other services in the Kerberos 
realm. 
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fixes. Kerberos requires a working C compiler, GNU make, 
GNU bison, and, for testing, Perl and Tcl. 

Kerberos uses the GNU configure script to tailor the source for 
each platform. By default, DNS resolution of Kerberos realms is not 
enabled (because of concern for spoofing), so it will need to be 
enabled. Also, different operating systems have specific require- 
ments that are identified in the Installation guide (unpacked into the 
doc directory and available on the MIT site). The default destination 
prefix is /usr/local, and the default KDC database directory is 
$PREFIX/var/krb5kdc. 

The makefiles include a target for testing the software, so it is 
important to run “make check” to verify that the code works before 
installing it. The default tests merely check that the components are 
built correctly, but support for the DejaGNU test suite is integrated 
and if present, will test Kerberos client/server interaction. I built 
Kerberos on BSD/OS with the following commands (when 
mounted from the appropriate operating system: 

% PWO=/usr/local/obj 
% /usr/local/src/krb5-1.2.2/src/configure -enable-dns 
% make 


% make check 
% make install 


KDC Configuration 

Once the software is built and installed, the KDC must be config- 
ured. This is done by initializing the database, adding administrators 
and configuring master and slave KDCs. The configuration files 
on the KDC servers define the location for the realm databases 
and the default lifetimes for tickets. The configuration files are 
text files structured similar to Microsoft .INI files. The kdc.conf 
file is found on all KDCs in a Kerberos realm (usually located in 
the Kerberos directory /var/krb5kdc/ and in this implementation 
contains the following information: 


[kdcdefaults] 
kdc_ports = 88,750 # this supports the V5 (88) and V4 (750) clients 
{realms 
TAURUS. TEST = { 
database_name = /usr/local/var/krb5kdc/principal 
admin_keytab = /usr/local/var/krb5kdc/kadm5. keytab 
acl_file = /usr/local/var/krb5kdc/kadm5.ac 
dict_file = /usr/local/var/krb5kdc/kadm5.dict 
key_stash_file = /usr/local/var/krb5kdc/.k5. TAURUS. TEST 
kadmind_port = 749 
max_life = 10h Om 0s 
max_renewable_life = 7d Oh Om Os 
master_key_type = des3-hmac-shal 
supported_enctypes = des3-hmac-shal:normal \ 
es-cbc-crc:normal des-cbc-cre:v4 
kdc_supported_enctypes = des3-hmac-shal:normal \ 
des3-cbc-crc:normal des-\cbc-crc:normal 


} 


The krb5.conf file is located in /etc for all Kerberos V systems. 
Although DNS support has reduced the need for some sections of 
the krb5.conf file, that file is used to define defaults for Kerberos 
applications and is therefore a requirement for every host in a 
Kerberos realm. For most Kerberos hosts it is only necessary to 
keep the [libdefaults] stanza, adding the [realms] stanza with 
the admin_server option if access to the Kerberos admin tool is 
required from that host. For the KDC (master and slave), it contains 
the following: 
libdefaults] 

ticket_lifetime = 600 

default_realm = TAURUS. TEST 

default_tkt_enctypes = des3-hmac-shal des-cbc-crc 

default_tgs_enctypes = des3-hmac-shal des-cbc-crc 
realms] 


TAURUS. TEST = { 
admin_server = kerberos.taurus.test:749 


logging] 
kde = FILE:/var/log/krb5kdc.1log 
admin_server = FILE:/var/log/kadmin. log 
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default = FILE:/var/log/krb51ib.1log 


Once these two files are created, the database must be initialized. 
That is performed with the command kdb5_util: 
% /usr/local/sbin/kdb5_util create -r TAURUS.TEST -s 
Initializing database '/usr/local/var/krb5kdc/principal' for => \ 
realm 'TAURUS.TEST', master key name 'K/M@TAURUS.TEST' 
You will be prompted for the database Master Password. 
It is important that you NOT FORGET this password. 
Enter KDC database master key: <type the master password> 


Re-enter KDC database master key to verify: <again> 
% 


This creates the principal database. Because all the credentials 
are stored in this database, it is imperative to keep the password 
for this database secure. By default, the files created by this 
command are kept in /usr/local/var/krb5kdc and include the 
database itself (principal and principal.ok), the administrative 
files (principal.kadm5.*), and the stash file (.k5.<<REALM>>). 
The stash file is used to provide the master password to the 
Kerberos daemon at system startup. It is not necessary if the startup 
procedure is designed to prompt for the password at system ini- 
tialization. Although the password is encrypted and the file is 
protected, truly paranoid sites may want to not use it at all. It is 
created with the -$ switch to the kdb5_util create command. 

The next step is to create an administrator and add it to the ACL 
file. The ACL file is defined in krb.conf and is named by default 
“kadm5.acl”. The format is: 


Kerberos principal permissions optional target principa 


The only entry required at this time is for an administrator. This 
file controls access to the Kerberos database so ordinary users do 
not need to be mentioned in it. Because the file syntax allows wild- 


card entries, I prepared an initial file allowing all administrators in 
the realm to have complete access to the database: 


*/admin@TAURUS.TEST * 


Now we can create the initial administrator on the master KDC 
using the kadmin.local program. Both kadmin and kadmin.local 
perform the same functions, but kadmin.local is only available 
on the master KDC. Kadmin is available on any Kerberos sys- 
tem and allows for remote management. We need to create the 
first administrator locally so that we can use that principal to 
create slave servers. 


% /usr/local/sbin/kadmin. local 

kadmin.local: addprinc admin/admin@TAURUS. TEST 

WARNING: no policy specified for “admin/admin@TAURUS.TEST"; \ 
defaulting to no policy 

Enter password for principal admin/admin@TAURUS.TEST: <password> 

Re-enter password for principal admin/admin@TAURUS.TEST: <again> 

Principal "admin/admin@TAURUS.TEST" created. 

kadmin. local: 


Before exiting from kadmin.local, it is necessary to create the 
keytab files for the special principals “kadmin/admin” and “kad- 
min/changepw”. This is done with the following commands (a 
backslash shows where two lines should be typed as one): 


kadmin.local: ktadd -k /usr/local/var/krbSkdc/kadmS.keytab \ 
kadmin/admin kadmin/changepw 
Entry for principal kadmin/admin with kvno 3, encryption type 
Triple DES cbc mode with HMAC/shal added to keytab 
WRFILE:/usr/local/var/krb5kdc/kadm5. keytab. 
Entry for principal kadmin/admin with kvno 3, encryption type 
DES cbc mode with CRC-32 added to keytab 
WRFILE:/usr/local/var/krb5kdc/kadmS. keytab. 
Entry for principal kadmin/changepw with kvno 3, encryption type 
Triple DES cbc mode with HMAC/shal added to keytab 
WRFILE:/usr/local/var/krb5kdc/kadm5. keytab. 
Entry for principal kadmin/changepw with kvno 3, encryption type 
DES cbc mode with CRC-32 added to keytab 
WRFILE:/usr/local/var/krb5kdc/kadmS. keytab. 
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Now, the KDC and Kerberos admin server can be started: 


% /usr/local/sbin/krb5kdc 
% /usr/local/sbin/kadmind 


Each program will run as a daemon. These commands should be 
added to the appropriate startup file for the system 
(/etc/init.d/krb5 for Red Hat, /etc/rc.local for BSD sys- 
tems). Starting automatically will require a stash file. After starting, 
the log files should be checked to verify that they started correctly. 
The file krb5kdc.log showed “commencing operation” as the last 
line, while the file kadmind.log showed “starting”. 


Adding Slave Servers 

The slave servers now need to be configured. Assuming that the 
Kerberos software has been compiled and installed on the slave 
server, the slaves must be created as Kerberos principals. They must 
also have their keytabs extracted, installed locally, and configured 
for database propagation. 

Each host in a Kerberos system needs to be defined as a principal 
in the Kerberos database and have its secret key extracted to a 
keytab file stored locally. The keytab file holds an encrypted copy of 
the host’s password, but it must be protected from unauthorized 
access by making it unreadable by anyone but root and excluded 
from all backups. Both the canonical name and the Kerberos alias 
must be added as principals for the master and slave servers, and 
keytabs for both names must be stored locally. Additionally, the 
host’s idea of its own name must match the principal name as a fully 
qualified domain name. On Linux systems, this is done making 
FQDN the first entry in the /etc/hosts file (where the hostname 
is picked up). Other operating systems will have different tech- 
niques. 
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On aldebaran.taurus.test (Kerberos), run the following as root: 


% /usr/local/sbin/kadmin -p admin/admin 

Password: <enter admin password> 

kadmin: addprinc -randkey host/aldebaran. taurus. test 

WARNING: no policy specified for "host/aldebaran.taurus.test@TAURUS.TEST"; \ 
defaulting to no policy. 

kadmin: ktadd host/aldebaran. taurus. test 

kadmin: Entry for principal host/aldebaran.taurus.test@TAURUS.TEST with kvno 

3, encryption type Triple DES cbc mode with HMAC/shal added to keytab \ 
WRFILE: 

/etc/krb5. keytab. 

kadmin: Entry for principal host/aldebaran.taurus.test@TAURUS.TEST with \ 
kvno 

3, encryption type DES cbc mode with CRC-32 added to keytab WRFILE: 

/etc/krb5. keytab. 

kadmin: addprinc -randkey host/kerberos.taurus.test 

WARNING: no policy specified for “host/kerberos.taurus.test@TAURUS.TEST"; \ 
defaulting to no policy. 

kadmin: ktadd host/kerberos.taurus.test 

kadmin: Entry for principal host/kerberos.taurus.test@TAURUS.TEST with kvno 

3, encryption type Triple DES cbc mode with HMAC/shal added to keytab \ 
WRFILE: 

/etc/krb5.keytab. 

kadmin: Entry for principal host/kerberos.taurus.test@TAURUS.TEST with \ 
kvno 

3, encryption type DES cbc mode with CRC-32 added to keytab WRFIL 

/etc/krb5.keytab. 

kadmin: 


m 


Do the same on the slave server, ain.taurus.test (kerberos-1): 


% /usr/local/sbin/kadmin -p admin/admin 

Password: <enter admin password> 

kadmin: addprinc -randkey host/ain.taurus.test 

WARNING: no policy specified for "host/ain.taurus.test@TAURUS.TEST"; \ 

defaulting to no policy. 

kadmin: ktadd host/ain.taurus.test 

kadmin: Entry for principal host/ain.taurus.test@TAURUS.TEST with kvno 3, \ 

encryption type Triple DES cbc mode with HMAC/shal added to keytab WRFILE: 

/etc/krb5.keytab. 

kadmin: Entry for principal host/ain.taurus.test@TAURUS.TEST with kvno 3, \ 

encryption type DES cbc mode with CRC-32 added to keytab WRFILE: 

/etc/krb5. keytab. 

kadmin: addprinc -randkey host/kerberos-1.taurus. tes 

WARNING: no policy specified for "host/kerberos-1.taurus.test@TAURUS.TEST"; \ 
defaulting to no policy. 

kadmin: ktadd host/kerberos-1.taurus.test 

kadmin: Entry for principal host/kerberos-1.taurus.test@TAURUS.TEST with \ 
kvno 3, encryption type Triple DES cbc mode with HMAC/shal added to \ 
keytab WRFILE: /etc/krb5. keytab. 

kadmin: Entry for principal host/kerberos-1.taurus.test@TAURUS.TEST with \ 
kvno 3, encryption type DES cbc mode with CRC-32 added to keytab \ 
WRFILE: /etc/krb5.keytab. 

kadmin: 


Now we can enable propagation of the database from the master 
to the slave KDCs. This process will eventually need to be per- 
formed as a cron job, but the initial stage will perform it manually. 
First, each KDC needs an acl file for propagation (by default named 
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/usr/local/var/krb5kdce/kpropd.acl, containing the principals 
for each of the KDCs: 


host/kerberos.taurus.test@TAURUS. TEST 
host/kerberos-1.taurus.test@TAURUS. TEST 


The propagation daemon needs to be added to /etc/inetd. conf 
as follows: 
krb5_prop stream tcp nowait root /usr/local/sbin/kpropd kpropd 
eklogin stream tcp nowait root /usr/local/sbin/klogind \ 
klogind -k -c -e 
The following services must also be defined in the 
/etc/services file: 


Kerberos 88/udp 
kerberos 88/tcp 
krb5_prop 754/tcp 


kerberos-adm 749/tcp 
kerberos-adm 749/udp 
eklogin 2105/tcp 


The dump propagation is done on the master KDC with the fol- 
lowing commands: 
% /usr/local/sbin/kdb5_util dump /usr/local/var/krb5kdc/slave_datatrans 


% /usr/local/sbin/kprop -f/usr/local/var/krb5kdc/ \ 
slave_datatrans kerberos-1.taurus.test 


If successful, the kprop command will print “Database propaga- 
tion to kerberos-1.taurus.test! SUCCEEDED”, and the slave will 
have a current copy of the /usr/local/var/krb5kdc/principal 
databases. Some common causes for failure are the inclusion of 
hosts in the _kerberos._udp SRV records that are not running the 
krb5kdc program, or a hostname on the slave that is not listed in the 
known Kerberos principals. The Kerberos distribution includes a 
shell script, krb5-1.2.2/src/slave/kslave_update, that can be started 
as a cron job to automatically propagate the database to a named 
slave KDCs. 

It is now possible to create a stash file for the slave KDC and to 
start the krbSkdc daemon. The kadmind daemon is only started on 
the master server. 

% /usr/local/sbin/kdb5_util stash 

kdb5_util: Cannot read/find stored master key while reading master key 
kdb5_util: Warning: proceeding without master key 

Enter KDC database master key: <Enter database password> 


%/usr/local/sbin/krb5kdc 
4 


THANK YOU. NOW 
RESTARTING X. 
LOADING ADMIN 

TOOLS... 

YE FLIPPING 

GODS! WHAT 


HAVE | DONE? 
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Once the Kerberos daemon is restarted, the slave server can be 
added to the _kerberos._udp SRV record in the DNS, and the DNS 
can be restarted. 


Adding Principals 

A Kerberos principal is a person or host known to the 
Kerberos system. A principal’s name is typically divided into 
three parts: PRIMARY/INSTANCE@REALM, although the 
instance portion is optional. For example, a host would use the 
format host/hostname.domain@REALM (although the REALM 
can be implied), while an individual would typically leave off the 
instance, simply using person@REALM. Microsoft, however, 
does not support this structure and requires a different format for 
cross-platform integration. 

In Kerberos, only hosts that supply services need to be registered 
as principals. Since services include telnet, ftp, and the Berkeley r* 
commands (as well as ssh). In practice, most UNIX hosts will need 
to be registered. However, Windows and Macintosh hosts do not 
need to be registered unless they are providing file services. As 
described above, a host is typically registered by running the 
kadmin program on it, creating the principal and exporting its 
host key to the local keytab file. In this process, we rely on 
Kerberos to generate a random key for the host password: 
myhost % /usr/local/sbin/kadmin -p admin/admin 
Password: <enter admin password> 
kadmin: addprinc -randkey host/myhost.taurus.test 
WARNING: no policy specified for "host/myhost.taurus.test@TAURUS. TEST"; 
defaulting to no policy. 
kadmin: ktadd host/myhost.taurus.test 
kadmin: Entry for principal host/myhost.taurus.test@TAURUS.TEST with kvno 3, 
encryption type Triple DES cbc mode with HMAC/shal added to keytab WRFILE 
/etc/krb5. keytab. 
kadmin: Entry for principal host/myhost.taurus.test@TAURUS.TEST with kvno 3, 
encryption type DES cbc mode with CRC-32 added to keytab WRFILE 
/etc/krb5.keytab. 


kadmin: exit 
myhost % 


Windows 2000 servers do not support the ktadd command, so they 
are created differently. According to Microsoft, the process for inte- 
grating a Windows 2000 workstation in an existing Kerberos realm 
requires the following steps: 


1. Add the Windows 2000 workstation as a principal in the realm. 
2. Identify the realm and KDC to the workstation. 
3. Map the usernames to principals in the realm for single sign-on. 


The first step is performed in the Kerberos realm using kadmin: 


% kadmin 
kadmin: ank -pw <password> host/merope.pleaides.taurus.test 
WARNING: no policy specified for \ 
host/merope.pleaides.taurus.test@TAURUS.TEST; \ 
defaulting to no policy 
Principal "host/merope.pleaides.taurus.test@TAURUS.TEST" created 
kadmin: 


The next two steps are performed on the Windows 2000 work- 
station. Microsoft supplies the commands ksetup and ktpass for 
this purpose. Although they are not installed by default, they can be 
added from the Windows 2000 distribution media by running 
SETUP from the \support\tools\ directory on the CD. The tools are 
placed in the directory \Program Files\Support Tools\, which is 
added to the path after rebooting. 

ksetup /setdomain TAURUS. TEST 


C> 
C:> ksetup /addkdc TAURUS.TEST kerberos.taurus.test 
C:> ksetup /setmachpassword <password> #use the same one as above 


It is now necessary to restart the Windows 2000 workstation. 
After it reboots, the command to map users can be issued and 
Windows will use the Kerberos V KDC for authentication. Multiple 
maps can be provided, including wildcards. In this example, we 
map the Kerberos admin user to Administrator and all others to 
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corresponding local users. Note that the use of instances in a 
principal name for users is not supported by Windows (the slash 
is interpreted as an NT domain). 


C:> ksetup /mapuser admin@TAURUS.TEST Administrator 
C:> ksetup /mapuser * * 


Microsoft does not have a kadmin client and, therefore, does not 
provide an encrypted channel over which to transmit host keys. 
Thus, it is necessary to explicitly specify the machine password 
when adding it to the Kerberos V database. 


Users 

Each user in a Kerberos system must be defined as a principal. 
Most individual users would be created without an instance, but it 
is useful to include instances where a user performs multiple 
roles. For example, an individual might have a principal record of 
joe@taurus.test, but also have a principal for administrative use 
stored as joe/admin @taurus.test. This latter record would match 
the administrative ACL file, kadmin.acl, as defined above that 
allowed all “*/admin@taurus.net” users full administrative 
access to the system. 


Kerberized Applications 

Kerberos would not be much use without a way to integrate it 
into applications. The standard distribution includes the following 
applications, which would replace the non-kerberized versions in 
/etc/inetd.conf on Kerberos hosts: 
ftpd 
klogind 


kshd 
telnetd 
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These programs are in /usr/local/sbin, and the corresponding 
client applications are in /usr/1ocal/bin. Those directories should 
be placed ahead of the /usr/bin and /usr/sbin directories in stan- 
dard PATH variables. Besides these applications, Kerberos clients 
. also include the following programs: 


* login.krb5 — A login program to allow single sign-on. 

 kinit — Obtains tickets for you. 

¢ klist — Lists currently held tickets. 

¢ kdestroy — Destroys your tickets (should be run automatically on 
logout). 

° kpasswd — Changes your password on the KDC. 

¢ ksu — Changes your default principal (like su but referring to the 
KDC). 


Kerberos IV Integration 

Since many platforms have Kerberos IV clients and it may be 
impossible to upgrade them to Kerberos V, I will show how to con- 
figure a Kerberos IV client to communicate with the Kerberos V 
server. The client configuration files are stored in /etc/kerberosIV 
and named krb.conf and krb.realms. They must have entries that 
define the default realm and identify the KDCs for that realm. 
Krb.conf should define the local realm in the first line and fol- 
low that by any number of lines defining realm/host entries. In 
TAURUS.TEST, a Kerberos IV krb.conf file will look like this: 
TAURUS. TEST 


TAURUS. TEST 
TAURUS. TEST 


kerberos.taurus.test server admin 
kerberos-l.taurus.test server 
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The krb.realms file provides a way to translate from a host name to 
a realm name without implicitly matching them to the DNS domain. 
It contains lines of the form: 


host_name kerberos_realm 
domain_name kerberos_realm 


where a domain name is identified by an initial dot (.) before the 
domain. A krb.realm file for TAURUS.TEST would therefore be: 


taurus.test TAURUS.TEST 
.taurus.test TAURUS.TEST 


This will cover both the domain name and any hosts listed 
within that domain. With this configuration, a user can run the 
Kerberos IV client packages available under operating systems such 
as OpenBSD. To run kerberized services it is still necessary to 
upgrade to Kerberos V. 


Server Management 

Because the KDC is a critical component of a Kerberos sys- 
tem, it must be closely protected. MIT recommends that 
/etc/inetd.conf be limited to the following services to control 
access to the KDC: 
time stream tcp nowait root interna 
time  dgram udp nowait root interna 
krb5_prop stream tcp nowait root /usr/local/sbin/kpropd kpropd 


eklogin stream tcp nowait root /usr/local/sbin/klogind \ 
klogind -k -c -e 


The eklogin service requires an encrypted session and will reject 
a non-encrypted rlogin attempt. To connect to the KDC, it is neces- 
sary to issue the following commands: 
% kinit 
Password for dsmith@TAURUS.TEST: <enter password> 
% rlogin -x aldebaran.taurus.test 


Kerberos Resources 


MIT has an extensive Kerberos Web site at: 


http://web.mit.edu/kerberos/www 


The FAQ from that site provides additional information including 
some information about connections to Windows 2000. Microsoft 
also has an extensive set of pages on Kerberos under Windows 
2000, including step-by-step instructions for interoperability: 


http://www.microsoft.com/windows2000/techinfo/planning/security/ \ 
kerbsteps.asp 


They do not provide any information about configuring NT 4.0 
clients to support Kerberos. 

Cygnus developed a Kerberos implementation (based in 
Kerberos IV) for Windows NT that integrated the login screen to a 
KDC, but it required the Cygwin libraries and is no longer available 
from Cygnus. It has been mirrored in several locations and can now 
be obtained from the site crypto. radius.net. The Kerberos proto- 
col is defined in RFC 1510. 


David Smith has been programming for more than 30 years and has worked as a con- 
sultant for the last 10. He has designed access control systems and data transfer proto- 
cols for several applications on multiple platforms. He is currently an independent 
consultant and can be contacted at: David. Smith@acm. org. 
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Using Kerberos 


Alex Withers 


ore systems administrators than ever work 
M: a heterogeneous computing environment, 

and it has recently become more complex to 
integrate hosts into such a mixed environment. One of 
the main reasons for this increased complexity is 
because of the explosion of Microsoft and Linux prod- 
ucts in the data center. Linux doesn’t have problems 
talking to existing UNIX machines, and Novell has 
helped Linux by developing a client for their software, 
but if you want existing Linux or UNIX machines to 
talk to a Microsoft box, you may run into trouble. 
(Samba has been one of the options for integration, 
however it is not a Microsoft-supported product. 
Nonetheless, Samba works great and can accomplish 
many needs in the integration of Windows and 
Linux/UNIX machines.) 

With Windows 2000, we saw an effort on 
Microsoft’s behalf to help integrate their products into 
UNIX environments. Some would argue that 
Microsoft’s efforts have been dubious at best and that 
Microsoft has ulterior motives. Although the latter 
may be true, the first is not altogether accurate. 
Windows 2000 boxes can indeed talk to UNIX-type 
boxes when authentication is needed, because 
Microsoft has adopted the Kerberos standard as its 
authentication mechanism. Therefore, in an ideal situation, 
Windows 2000 clients can authenticate to a Kerberos box, and a 
UNIX client can authenticate to a Windows 2000 Active Directory 
server. However, I found little or no documentation to get a UNIX 
box to authenticate to a Windows 2000 box. 

In this article, I will show how to get UNIX to authenticate to a 
Windows 2000 box using Kerberos 5. By “UNIX”, I am referring to 
the platforms on which I have successfully tested and implemented 
this — mainly Red Hat Linux 7.1 and HP-UX 11.0. Because 
Kerberos 5 is standard across all UNIX-style platforms, these exam- 
ples should work on all platforms of UNIX, distributions of Linux, 
and open source BSDs. (The only caveat involves the configuration 
of PAM because that varies by platform and distribution; some 
platforms, such as OpenBSD, don’t even use PAM by default.) 

This article also focuses more on the Linux and UNIX side of 
things than Windows 2000, which means I will discuss getting 
UNIX and Linux machines to authenticate as clients to a Windows 
2000 Active Directory server. For those familiar with Kerberos, the 
Windows 2000 box will be the KDC (key distribution center). If 
you’re not familiar with Kerberos, this article only requires a work- 
ing knowledge. See the Kerberos sidebar, read David Smith’s article 
“Implementing Kerberos” in this issue of Sys Admin, or visit MIT’s 
Kerberos site: 


http://web.mit.edu/kerberos/www/ 
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CONFIGURATION 


This article assumes that you’ve set up a Windows 2000 Active 
Directory server according to your needs. Nothing needs to be done 
during the install or setup, so presumably any AD server will work 
for this task. Practically speaking, you will want your Domain 
Controller to be your KDC, because this is where the accounts will 
reside. I will use an example AD domain “SOMEPLACE.COM”. 
This is the top-level name space for my example AD setup. There 
is only one AD server in this example, and its DNS name is 
“ad.someplace.com” with an IP address of “10.0.0.1”. Note that 
the AD domain “SOMEPLACE.COM” is also the Kerberos 
realm. The Kerberos client (our UNIX-type box) will be called 
“client.someplace.com” with an IP address of “10.0.0.2”, and we 
will set it up to authenticate to the AD server. 

Setting up the AD server is simple and only requires a couple of 
steps. Each UNIX machine that is going to authenticate using 
Kerberos 5 must have a user account on the AD server. Simply add a 
user and make sure its first name and login name are the host name of 
that UNIX machine. Make sure the correct domain is selected in the 
field next to the login name. For our example, the UNIX machine 
“client” would get a first name of “client”, a login name of 
“client@someplace.com’, and “mypass” as a password. Now that the 
account has been created, the next step is to generate the keytab file. 
You must first make sure that the Kerberos configuration utilities are 
installed; it’s the package found on the Windows 2000 distribution 
media under the support/tools directory. To install the package, simply 
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run “setup.exe”. This package will give you one utility in particular — 
Ktpass. Simply open a command shell and change to the root 
directory. The following command will create a keytab file: 


C:\> Ktpass -princ client/client@SOMEPLACE.COM -mapuser client -pass 
mypass -out client.keytab 


This creates the keytab file and places it in your root directory. 
Notice that the principle (denoted with the -princ argument) has 
a root that is “client”, an instance “/client”, and the realm 
“SOMEPLACE.COM”. It is important to make sure that the realm 
is typed in all caps. Now that the keytab file has been generated, it 
needs to be transferred securely to “client.someplace.com”. 

Next, we configure the UNIX machine, “client.someplace.com”. 
The first step is to create an /etc/krb5.conf file. Any UNIX 
machine using Kerberos 5 should have an example file in place, but 
creating the file is simple. This file is the main configuration file for 
all Kerberos 5 applications that use the Kerberos 5 library. It con- 
tains vital information such as the location of TGSs (Ticket 
Granting Service) and KDCs (Key Distribution Center). The file is 
divided into sections where each section is labeled with a heading 
([heading]). Under the headings, we have tag = value and tag = 
{subtag = value ... etc.}. The first section in the file should be the 
“libdefaults” section. In this section, we specify some simple 
configuration options that, in our example, would look like: 


[libdefaults] 
default_realm = SOMEPLACE.COM 
dns_lookup_realm = true 
dns_lookup_kdc = true 
default_tkt_enctypes = des-cbc-md5 
default_tgs_enctypes = des-cbc-md5 


Kerberos 


Kerberos is a third-party authentication protocol that acts as 
an arbitrator. This protocol allows for users to authenticate and 
securely access services on the network. Kerberos tries to 
eliminate the dangers of sending clear-text passwords over the 
network. It also provides a mechanism for a client to verify that 
it really is the client and not some imposter. 

The first step is for the client to send its principle to the 
Kerberos authentication server (a.k.a. KDC). The principle is a 
user or service that is able to authenticate using Kerberos. For a 
user, the principle is the login ID and the name of the TGS 
(Ticket Granting Server). The KDC makes sure the user is in the 
database and generates a session key to be used between the 
client and the TGS. This session key is also called the TGT 
(Ticket Granting Ticket). The TGT is encrypted by the KDC 
with the user’s secret key (the user’s password) and sent back 
to the client from which the user requested the TGT. The TGT 
cannot be decrypted without the user’s password because the 
password is the secret key. 

If the client wants access to a network service, the client must 
have the TGT to obtain a ticket from the TGS. If the user does 
not authenticate successfully with the process described previ- 
ously, then he or she cannot gain access to network services that 
require Kerberos authentication. If the user has obtained the 
TGT, then it can be used to obtain a ticket from the TGS. This 
ticket that the client receives from the TGS is then used to 
authenticate the user. Now the user has access to that network 
service using a secure authentication method. 
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As mentioned above, the AD domain SOMEPLACE.COM is 
also the realm. Specify the realm by using the “‘default_realm’” tag. 
The other tags are pretty self-explanatory — use DNS to look up 
names and tell the libraries which default encryption technique is to 
be used. The next section, “realms”, is defined in order to configure 
the realm specified earlier: 


[realms] 
SOMEPLACE.COM = { 
kde = ad.someplace.com:88 
kpasswd_server = ad.someplace.com:464 


} 


There’s not much to this. We’re specifying where the KDC can 
be found for our default realm. Notice that it is the address of the 
AD server followed by a port number. This is where the tickets are 
issued, and thus the KDC in this example is the TGS. We also spec- 
ify the address of the kpasswd server followed by a port number. 
This will allow users to use the “kpasswd” utility for changing their 
Kerberos passwords. It is also be a good idea to make sure that these 
ports exist in your /etc/services file: 


kerberos 88/tcp Kerberos5 krb5 # Kerberos v5 
kerberos 88/udp kerberos5 krb5 # Kerberos v5 
kpasswd 464/tcp kpwd # Kerberos “passwd” 
kpasswd 464/udp kpwd # Kerberos “passwd” 


There is one more section necessary before configuration is 
finished. This section, called “domain_realm”, is used for transla- 
tion. It contains relations that map subdomains and domain 
names to a Kerberos realm name. For example, on the host 
“foo.someplace.com”, notice that the host name does not contain 
the subdomain “HR.SOMEPLACE.COM” (which happens to be 
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the realm). So, we need to provide a mapping of “.someplace.com 
=HR.SOMEPLACE.COM”. This section is very important and if 
it’s not provided, it can cause a failure to connect to the KDC. In our 
example, we would use the following: 


[domain_realm] 
.someplace.com = SONEPLACE.COM 


This completes configuration of the client, but there are many 
more options available. The krb5.conf(5) man page provides more 
information. 

It’s now time to put the keytab file we generated on the AD 
server to use by integrating it with the client’s existing keytab file. 
This step involves using the ktutil utility on the client. Run ktutil 
as root, and you should be presented with the following prompt: 


Ktutil: 


Integrating the keytab requires three simple steps: 


ktutil: rkt client. keytab 
ktutil: wkt /etc/krb5. keytab 
ktutil: q 


Read the keytab file, write it to the master keytab file 
/etc/krb5. keytab, and quit the program. Nothing else is required, 
and Kerberos 5 should be fully operable on the client. 

For a user to get tickets from the KDC, he or she must have 
an account on the AD server. Once the account is created and 
the user has logged into the client, he can now be authenticated 
using Kerberos. A ticket is usually granted by using the kinit 
utility, which requests the user’s password on the AD server. A 
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TinyTerm Plus w/TCP S-user 995 795 |DoubleVision Remote Support SW 895 730 
FacetWin 5-user 850 685 | Legato Networker Backup SW 2500 2050 
Ice.Ten [Serial PC-UNIX unlim.) 395 320 | Plextor CD-RW Ext. w/ Gear S/W 1995 1550 
Hummingbird eXceed for Win 545 399 | Tandberg 70GB DLT Tape, Ext. 3995 3350 
Term or Blast UNIX Comm. S/W 495 380 |Exabytelz17 420-1000GB Autoload 7995 6495 
DoubleVision Remote Supp. S/W 495 395 |1BM 18GB Ext SCSI HD 7200rpm 675 495 
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Plus: 1776 3COM Adaptec Adobe Ascend APC BSDI Caldera Cisco Computone DPT 
Equinox Esker Exabyte Faximum Intel J.River JSB Kingston Liant Maxpeed McAfee 
Microsoft Multi-Tech Netscape Seagate Specialix Sunsoft Tandberg WRQ Wyse & more! 
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user can also change his Kerberos password by using kpasswd, 
which will essentially change the password used on the AD 
server. Consider the following scenario — what if I wanted the 
user to log in using his AD login name and password so that he 
has one login name and password for multiple platforms? Thus, in 
our example, a user by the name of “jsmith” would use this login 
name and the same password on both “client.someplace.com” 
and “ad.someplace.com”. The password resides on only one 
box, “ad.someplace.com”, and the authentication method has 
the advantage of using a secure method. One way of doing this 
is by using PAM. 

On UNIX systems PAM allows for multiple types of applica- 
tions to authenticate a user. PAM provides an additional layer to the 
system so that the administrator has more control over authentica- 
tion. Because most of the popular applications go through this layer, 
we can authenticate using Kerberos by having PAM do this for us. 
So when a user logs in using telnet, ftp, ssh, imap, etc., they will 
all authenticate with PAM, which in turn will use Kerberos behind 
the scenes for authentication. This three-layer model will allow all 
services on the UNIX box to authenticate using Keberos. 

Using PAM, we are only concerned about the UNIX client, 
because a Kerberos server does not use PAM. The module discussed 
here is “‘pam_krbS”, which can be used by PAM-ified applications 
to authenticate a user. You might undermine the security of your 
network if you allow all applications to authenticate with PAM. For 
example, consider a situation where a pop3 client that sends a plain- 
text password over the network and the pop3 server uses PAM, and 
thus pam_krb5, to authenticate that client. For better security, you 
must use Kerberized services that take advantage of GSS-API. But 
if you simply want a way to check passwords without worrying 
about security, then all applications that use PAM will authenticate 
through Kerberos. 

Before configuring PAM, make sure that Kerberos is working by 
obtaining credentials from the KDC. If everything appears to be 
working, then it’s time to configure PAM. Essentially, this is what it 
should look like under Linux: 


auth required /lib/security/pam_env.so 

auth sufficient /lib/security/pam_krb5.so 

auth sufficient /lib/security/pam_unix.so nullok md5 \ 
shadow likeauth use_first_pass 

auth required /lib/security/pam_deny.so 


I have left this example as generic as possible since different dis- 
tributions of Linux will have PAM set up differently. For instance, 
under Red Hat 7.x, all PAM configuration files are found under 
the /etc/pam.d directory. Each application has its own configura- 
tion file but they all refer to /etc/pam.d/system-auth. Check the 
documentation for more information on your distribution. Here I 
have only altered the “auth” section of the PAM configuration 
file. PAM acts like a stack, so the modules are run from top to 
bottom with the conditions in the second field. The first module 
simply sets up an environment. The second module, “pam_krb5”, 
is set to “sufficient”, which means that if the user authenticates 
with his user name and password, then it returns success and the 
user is able to log in with Kerberos 5 credentials. The third mod- 
ule is checked but since the second module is “sufficient”, it 
really doesn’t matter whether the user authenticates successfully. 
But if the pam_krb5 module fails, this can allow the user to log in 
provided that his account and password match with the 
/etc/passwd scheme. 

Note that just because the third module is “sufficient” it doesn’t 
mean that user doesn’t need an account. Any user authenticating 
to the client must have an account with the same user name as on 
the KDC but doesn’t need a password. Thus, the accounts on 


December 2001 


FacetWin® makes Windows and NT® to UNIX® integration easy, manageable and affordable. 


ewer Rather than installing connectivity software on hundreds or thousands of PCs, you can install 
FacetWin on a single UNIX host. FacetWin eliminates the need for PC-based NFS or FTP software — 
files and printers on UNIX systems simply appear as local resources to PC users. Other features of FacetWin 
include easy terminal emulation, PC backup/restore and modem sharing — all from a centrally q 
managed UNIX host. Visit www.sch.com/facetwin.asp to register for an MP3 Player give- 
away and to download a FacetWin demo. Or call 888.SCH.OPEN. twins sar tg is 2 regieedradea of Hs Coperin Technologies 


6639 


“client.someplace.com” should all have a in the password 
field for /etc/passwd or /etc/shadow, etc. Otherwise, you have 
to change the “sufficient” field to “required” and the passwords 
have to be the same on both machines. For example, if “jsmith” 
telnets in and enters a user name and password that exists on the 
KDC, he will then be able to log in. If root telnets in but there is no 
“root” account on the KDC, then the pam_unix module will use 
the username and password that was given to pam_krb5. The 
pam_unix module will see whether the root username and pass- 
word are valid and allow a login if they are. In this particular 
setup, the user names must be the same on all machines. All 
machines must also have an account for that user, but the pass- 
word resides on only one machine. 

Configuration on an HP-UX, FreeBSD, or Solaris box is similar 
to the Red Hat configuration. Here is an example snippet pulled out 
of the “auth” section of /etc/pam.conf on an HP-UX 11.0 box: 


login auth sufficient /usr/lib/security/libpam_krb5.1 


login auth sufficient /usr/lib/security/libpam_unix.1 \ 
use_first_pass 

su auth sufficient /usr/lib/security/libpam_krb5.1 

Su auth sufficient /usr/lib/security/libpam_unix.1 \ 


use_first_pass 
OTHER auth sufficient /usr/lib/security/libpam_krb5.1 
OTHER auth sufficient /usr/lib/security/libpam_unix.1 \ 
use_first_pass 


Depending on the configuration of your machine, there may be 
other entries (i.e., ftp and dtlogin) in the “auth” section. However, 
they should all be the same — “sufficient” for “libpam_krbS” and 
“sufficient” for “libpam_unix”. The path and name of the module 
may differ depending on machine and configuration. When config- 
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uring PAM, be sure to not inadvertently lock yourself out or create a 
way to bypass security. In any case, always check the pam_krb5(8) 
man page and the documentation for pam_krb5 and the other PAM 
modules that come with your system. 

Once PAM is configured, test it against multiple cases that 
might apply in your environment. It is a good idea to thoroughly test 
the security of your setup before setting out for production use. 
Users should now be able to log in to the client using multiple ser- 
vices (ssh, pop3, imap, etc.) and be authenticated by a Windows 
2000 Active Directory server. 


Thanks 


I would like to thank Greg Francis for all his help and for pro- 
viding the time and equipment. 


Resources 

Kerberos in the Red Hat Linux 7.1 Reference Guide: 
http://www.redhat.com/support/manuals/ \ 
RHL-7.1-Manual/ref-guide/ch-kerberos. html 

MIT’s Official Kerberos Page: 
http://web.mit.edu/kerberos/www/ 

Microsoft’s Kerberos Interoperability Page: 
http://www.microsoft.com/windows2000/techinfo/ \ 
howitworks/security/kerbint.asp 

A Linux-PAM Page: 
http://www. kernel .org/pub/1inux/libs/pam/ 

Alex Withers works as an intern at INEEL (a DOE lab) for the Cyber Security team. He 


will be returning to Gonzaga University to finish his education and then marry his won- 
derful fiance Anne. He can be reached at: awithers@gonzaga. edu. 
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4610 # print to standard output (for overriding in mod_perl} 
462 sub print { 

shift; 
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# unescape URL-encoded data 
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Web service needn’t use the Web. It needn’t even be a service. 
In fact, no one agrees on exactly what a Web service is, but 
there is a strong sense that, by golly, they’re important. 

To me, a Web service is a program that encourages other programs 
to send it requests, and that also could be (and often is) implemented 
via a set of Web pages. Put another way, a Web service is a networked 
program for which a CGI script could be used as a GUI. 

Clay Shirky, an O’Reilly analyst, polled a few experts to see how 
they defined Web services: 


My proposed 3 definitions were: 

1. Web Services are an attempt to do for computing what 
the Web itself did for publishing: to create a simple, 
loosely coupled method for two arbitrary applications to 
communicate with one another. 


2. Web Services are an attempt to define XML interfaces 
for applications and business processes that can be 
exposed over the Internet. 

3. Web Services are applications that have SOAP interfaces 
accessible via HTTP. 

Unsurprisingly, there was universal assent to the first defi- 

nition, and near-universal grumbling about the last one, 


often on the grounds that while that was what was making 
it into the press, it was far too narrow. 


In this article, I'll describe a Web service I created at O’Reilly, 
tell you how to install the software you need to create your own 
Web services, and demonstrate two sample clients and a server. 


ISBNs 


At O’Reilly, I’m involved in writing programs that analyze the 
technical book market. Part of this involves gathering information 
about books, and while we have a few sources available to us that reg- 
ular consumers don’t, we still rely on Amazon for some of our data. 

Every book sold in stores (or on Amazon) is identified by a sin- 
gle number: the 10-digit ISBN. (The last “digit” can actually be an 
X, since their checksum uses base 11.) You can find out the ISBN 
number for any book by searching for it on Amazon. 

I wrote an LWP program that automated this process so that I 
never had to visit Amazon myself. I called it isbnfind: 


% isbnfind programming perl 
0596000278 


This is great, but if I want to make it available to the rest of the world, I 
have two choices. I could distribute my isbnfind program and hope that 


PACKAGES USED 
Frontier::Client 
Frontier:: Daemon 
LWP::User Agent 
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= CREATING XML-RPC 
WEB SERVICES 


people have all the necessary modules and libraries installed. Or, I could 
make my program available as a Web service so that people can connect 
to it with a client. This also allows me to keep my source code private, 
and for this reason some people hail Web services as a salvation against 
the scourge of Open Source. I am not one of those people. 


XML-RPC or SOAP? 


In the example shown above, isbnfind pretends to be a human 
typing programming perl into the little search box at the top left of 
Amazon’s main page. Amazon’s Web pages obviously constitute a 
Web application. But how about a Web service? My isbnfind pro- 
gram tries pretty hard to treat it as one, but it’s brittle — it will fail as 
soon as Amazon makes a substantial change to the formatting of their 
Web pages, and they inevitably will. Per my definition, Amazon’s site 
isn’t a Web service, because Amazon doesn’t encourage other pro- 
grams to riffle through its databases for ISBN numbers. My program 
has to masquerade as a human to get its request answered. 

There are two popular protocols through which a Web site can 
encourage programmatic use: XML-RPC and SOAP. We can make 
ISBN numbers available as a Web service by turning isbnfind into 
a program that speaks XML-RPC. 

We could use SOAP instead, which many programmers prefer to 
XML-RPC. XML-RPC is simple and lightweight, SOAP is more 
featureful (some would say bloated). SOAP gets more mention in 
the press, in part because of Microsoft’s SOAP development efforts. 

You can use both XML-RPC and SOAP with Perl; in this article, 
I'll show you how to use Ken MacLeod’s Frontier::Client and 
Frontier::Daemon modules to implement an XML-RPC server and 
client. I’m a big fan of keeping simple tasks simple; converting a 
book title into an ISBN doesn’t require the extra overhead of SOAP, 
so I'll stick with XML-RPC. (If this were an article about SOAP, I’d 
be recommending Paul Kulchenko’s SOAP::Lite module.) Paul also 
distributed an XMLRPC::Lite module, and Randy Ray created an 
RPC::XML module. Both are available on the CPAN and may be a 
better match for your needs than the Frontier modules. 

You don’t have to know anything about XML to use XML-RPC, 
and the only thing you have to know about RPC (which stands for 
“Remote Procedure Calls”) is that it’s a way for you to invoke 
subroutines on someone else’s computer. 

XML-RPC is a simple protocol. Usually, the client encodes its 
request (“invoke this subroutine with these arguments”) as a wee 
XML document and sends it via HTTP to a server. The server 
composes its own wee XML document in response (“here’s what 
the subroutine returned’’) and sends it back. 

This is similar to what happens when you read a Web page: your 
browser sends a request via HTTP to the Web server, and the Web 
server sends a response via HTTP back to the browser. The only 
difference is that a Web application’s response is an HTML Web 
page, while an XML-RPC Web service’s response is the result of a 
subroutine (or “method”, in XML-RPC parlance). 
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To create an XML-RPC Web service, you create a server that 
exposes methods (i.e., Perl subroutines) to the outside world. 
Making your subroutines available is simple: just include them in a 
program and create a Frontier::Daemon object to expose them to 
the Internet. I’ll turn now to the details of getting your computer 
ready for XML-RPC. 


Installation 

To create an XML-RPC server, you need three things besides 
Perl: the XML::Parser and Frontier::RPC modules, and the expat 
library (on which XML::Parser depends). Follow these three steps. 


1. Install the expat XML parsing library from: 
http://sourceforge.net. 
Both Windows binaries and source code for Unix/Linux compi- 
lation are available. 

2. Install the XML::Parser module. 
On UNIX, you can use the CPAN.pm module bundled with Perl 
to install modules available on the Comprehensive Perl Archive 
Network: 


% perl -MCPAN -e ‘install "XML::Parser" 
If you’re using ActivePerl on Windows, XML::Parser will 
already be installed. 


3. Install the Frontier::RPC modules. On UNIX/Linux, you can use 
the CPAN.pm module again: 


% perl -MCPAN -e ‘install “Frontier: :Daemon"' 


With ActivePerl, type ppm from your command prompt and then 
install Frontier-RPC. 


ISN'T IT TIME 


TO OUTSOURCE? 


If you turn down Perl projects because you're short 
on time or don't have the staff, then contact ICE, Ltd., 
an offshore software team with experience in: 


* Perl development & scripting; 

¢ Extensive bioinformatics application development; 
* Experts in Acrobat plug-in development; 

* Microsoft tools, MS internals, VB, C++, etc; 

* Oracle design; 

¢ E-forms and document management design. 


References available. 


For more information: 
John Flood 

sales @iceindia.com 
Los Altos, CA 
650-947-4930 


www.iceindia.com 


P52 s www.tpj.com 


Creating a Client 

Servers are more important than clients — after all, if there’s no 
one to answer your requests, you needn’t bother asking — but since 
clients are simpler, I’ll show those first. I’1] demonstrate two: one 
stripped down to its bare essentials so that you can see the critical 
elements, and the “real” client I used when writing this article. 


Client 1: The Platonic Ideal 
Here’s a very simple XML-RPC client. We use the 
Frontier::Client module, create a new Client object primed to 
connect to port 8088 of labs.oreilly.com, and finally invoke the 
isbn method with the command-line arguments. 
#!/usr/bin/per] 


# Jon Orwant, 9/9/01 
# isbnclient 


use Frontier: :Client; 
$client = Frontier::Client->new(url => "http://labs.oreilly.com:8088/RPC2"); 


print $client->call("isbn", @ARGV), "\n"; 


Presuming the XML-RPC server at labs.oreilly.com is running on 
port 8088, we can treat it just like an isbnfind that is available to 
everyone instead of just me: 


% isbnclient programming perl] 
0596000278 


Note: the RPC2 at the end of the URL is necessary for XML-RPC 
services created with the Frontier::Daemon module I used, even 
though it has nothing to do with isbnfind and isn’t mentioned by 
name in the server you'll soon see. Omitting the RPC2 is a common 
mistake among XML-RPC novices. 


Client 2: The Developer’s Client 
The Platonic ideal client looks pretty, but it’s not what I actually 
used when developing the code for this article. Here’s the real code: 


#!/usr/bin/per] 
use Frontier: :Client; 


$client = Frontier::Client->new( url => "http://localhost:8088/RPC2", 
debug => 1); 


print $client->call("isbn", @ARGV), "\n"; 


There are two differences between this and the previous client. First, I 
tested everything on my laptop in case DNS or firewall problems pre- 
vented my laptop from accessing labs.oreilly.com. Using localhost 
instead of labs oreilly.com kept both server and client local. 

Second, I used debug => 1 to turn on debugging, letting me see 
the exact XML that’s being sent from client to server (the request) 
and from the server back to the client (the response): 


% isbnclient dava sobel longitude 
— request. — 

<?xml version="1.0"2> 
<methodCal1> 
<methodName>isbn</methodName> 
<params> 
<param><value><string>dava</string></value></param> 
<param><value><string>sobel</string></value></param> 
<param><value><string>longitude</string></value></param> 
</params> 
</methodCal1> 

— response — 

<?xml version="1.0"2> 
<methodResponse> 
<params> 
<param><value><i4>0140258795</i4></value></param> 
</params> 

</methodResponse> 

0140258795 
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This example highlights a feature of isbnclient, which is really an 
Amazon feature: you don’t have to match the title exactly, and can 
specify as many keywords as you like. isbnserver, which we’ll 
see in the next section, simply grabs the first ISBN it sees from the 
ranked list of books that Amazon suggests for those keywords. 


Creating the Server 
Our XML-RPC server is significantly larger than either client, 
but because it only needs to include the transformed isbnfind. 
Here, we use the Frontier::Daemon module instead of the 
Frontier::Client module. The lines in bold highlight the critical 
lines for creating an XML-RPC service. 
#!/usr/bin/perl 


# Jon Orwant, 9/9/01 
# isbnserver 


use Frontier: :Daemon; 
use LWP::UserAgent; 


# Create the LWP UserAgent object, 
# used to send requests to Amazon 
Sua = new LWP::UserAgent; 
$ua-Dagent("TPJ/0.1 " . $ua->agent); 


# Create the XML-RPC service 
Frontier: :Daemon->new(LocalPort => 8088, 
methods => { "isbn" => \&isbn }); 


# Given keywords, search for them on Amazon, 

# and return the ISBN of the first book found. 

sub isbn { 
my ($keywords) = "@_"; 
$keywords =~ s/ /%20/g; # Replace each space with "420" 
$keywords =~ s/'/%27/g; # Replace each apostrophe with "227" 


# Prepare the request for sending to Amazon 
my $req = new HTTP::Request POST => 

“http: //www.amazon.com/exec/obidos/search-handle-form/103-2425912-6530239' ; 
$req->content_type( ‘application/x-www-form-urlencoded'); 
$req->content("field-keywords=$keywords"); 


# Send the request to Amazon 
my $res = $ua->request($req); 


# Examine the response and return the first ISBN found 

if ($res->is_success) { # If we got a page back from Amazon... 
$content = $res->content; 
($isbn) = ($content =~ m!<a href=/exec/obidos/ASIN/({\dX]+)!gism); 
if ($isbn) { return $isodn } 
else { return "No ISBN found." } 

} else { 
return “Amazon changed their page."; 

} 

} 


After using the Frontier::Daemon and LWP::UserAgent modules, 
isbnserver creates an LWP agent that will be used for every request to 
Amazon. (Each request identifies itself as being a “TPJ’” browser, ver- 
sion 0.1.) The server is then launched by creating a new 
Frontier::Daemon object, providing (on port 8088) exactly one method: 
isbn, which is defined in the subroutine at the end of the program. 

As mentioned earlier, this code is brittle: the 103-2425912- 
6530239 in the Amazon URL is ample evidence that they don’t 
intend this URL to stick around forever. But as a demonstration 
of how quickly you can throw an XML-RPC interface around a 
conventional program, it serves its purpose. 


What Now? 


XML-RPC is an encoding scheme. This is level two of what Clay 
Shirky calls the “consensus stack” that people can use to explain the 
Web services universe. Below the encoding layer, at the lowest level, 
is transport. roughly, how you get the bits from server to client (or 
peer to peer). The transport we’re using here is HTTP, but it could 
almost as easily be Jabber instant messaging or regular email. 

There are two layers above encoding: description and discovery. 
Description is a formalized way of describing how programs can talk 
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to your Web service; WSDL (Web Services Description Language) is 
the best known, although no description layer is commonly used yet. 

Above description is discovery — making it possible for people 
and programs to learn about your Web service, typically by visiting 
a repository that describes in a structured fashion what the Web 
service offers. UDDI (Universal Description, Discovery, and 
Integration) is a business-oriented repository of Web services that 
allows companies to describe their services to customers and part- 
ners. There is also a nascent Web service repository at: 


http://www.salcentral.com/salnet/webserviceswsd1.asp 


although it has only a fraction of all the Web services out there. A 
better directory service for Web services is needed. After all, you 
can’t just write a magazine article about your Web service and 
assume everyone will learn about it. 
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I’m holding out but not getting an answer. 
I want to do right by you. 
But I’m finding out that cheating gets it faster. 


— Jimmy Eat World, “Get It Faster” 


ack in the days, the Latin-1 character set was the de facto 
standard for the Internet. It was just Latin letters plus a few 
accents, and that was enough for most of the Western 
European languages, but it left a whole lot of other languages out in 
the cold to try to make do with mutually incompatible encodings. Now 
we have Unicode, a single character set that can encode all the world’s 
languages, whether they’re written in accented Latin letters, like 
Icelandic or Vietnamese or Navajo, or in some quite different script, 
like Greek, Armenian, Chinese, Cherokee, or Hindi, to name a few. 
In an ideal world, all computers, applications, operating sys- 
tems, and protocols would have all the fonts and support for turning 
Unicode data into correctly formatted text on the screen. But the 
real world is stubbornly less than ideal, and for a very long time we 
will still have to deal with some systems that can’t reliably show 
much more than US-ASCIL To cope with that fact, I wrote a Perl 
module called Text::Unidecode, which takes Unicode text using any 
writing system, and tries to convey it using just US-ASCII. This 
article is about how complicated that task can be, how it should have 
taken me years — and how I actually managed it in just a few days. 


Basics of Different Scripts 

To explain why I made Text::Unidecode work the way it does, I 
need to describe some basic principles of world writing systems. 
Dealing with all the writing systems in Unicode has made me appre- 
ciate that while they are all superficially quite different, they are 
mostly just variations on a few basic themes. 

Most writing systems basically work on this plan: 


1. Figure out the sounds in what you’re trying to express. 


This sometimes involves some arbitrary decisions — such as 
what to do when a word’s pronunciation can change freely (like the 
fact that “heat” has a “t” sound its own, but put an unstressed vowel 
after it as in “heat up a bagel” and it sounds like “heed’’). Most writing 
systems, however, end up settling on these points without much 
bother and often without any conscious thought. 


2. Possibly toss out distinctions that aren’t crucial. 


Here, written English tosses out the distinction between stressed 
and unstressed syllables (so you can’t see the difference between a 


MODULES USED 
Text:: Unidecode 
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UNIDECODE! 


Burke 


cold com-press, and having to com-press data). Written Latin tossed 
out all distinctions between stressed and unstressed vowels, and 
between short and long. And Hebrew just usually tosses out its 
vowels altogether. And if you’re Cherokee, you toss out all the 
information about what tones you have in a word, and also toss out 
some sound distinctions, like between /k/’s and /g/’s. 


3. Group sounds according to some consistent scheme. 


This might mean just breaking between the words, or it might 
consider where the syllables stop and start, or it might mean both 
these things. 


4, Write that out. 


This might mean going from a sound to a symbol one at a 
time and without appeal to context. Or it might involve some 
context, as with Spanish, where you have to figure in Spanish, 
“well, I want a /k/ sound, and I’d normally write that c, but it’s 
going before an /i/ sound here, so I need to write it as gu instead” 
(as in “Quito”, the city). Or it might mean figuring (as with 
Japanese kana and Cherokee syllabics) “this whole syllable is 
/ki/” and looking up the way to write that syllable, in the chart of 
all possible whole syllables. 


5. Have the computer encode it. 


Since we’re talking about computer text, we also have an addi- 
tional step: when you “write out” what you mean, by moving your 
fingers over the keyboard, the computer eventually saves it to disk 
in some encoding. Surprisingly, the same character on the screen 
could be represented in fundamentally different ways, in different 
encodings. 


A good example of this process, in a non-Western writing sys- 
tem and its encoding, is Divehi, the language spoken (and written) 
in the Maldives, an archipelago southwest of India. Divehi is written 
right-to-left, with vowels written as marks over or under the 
consonants. The word “divehi” itself (the Divehi word for 
“Divehi’”!) illustrates how this works. 


1. Figure out the sounds of what we’re writing. 
The word we’ve expressed is represented phonetically as /divehi/. 
2. Possibly toss out distinctions that aren’t crucial. 


Divehi writing doesn’t distinguish stressed syllables from 
unstressed syllables. But it does happen to distinguish 10 different 
vowels. 


3. Group the sounds together. 


We note word breaks, since we’ll represent those as whitespace 
later. But more importantly, we group the word’s sounds into sylla- 
bles. /divehi/ becomes three syllables, /di/ /ve/ /hi/. 
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4, Write that out. 


We write right-to-left, writing each consonant on the baseline, 
and each vowel as a symbol above it or below it: 


4 
LIS 
gf A 


For example, note that the sound /e/ is represented by a mark that looks 
like a little “c”, and which goes over the consonant that starts that syl- 
lable. This works fine for syllables that are just a consonant and a vowel, 
but what about consonants and vowels elsewhere? The word for ‘“‘that” 
is just /e/, and how is it written? Divehi wisely uses a placeholder here: 
a letter that stands for the /ack of a preceding consonant. 


(4 e 
7 X 


(Here we represent the placeholder consonant in our ad hoc roman- 
ization as X.) 

When a consonant doesn’t begin a syllable (as with the /n/ in 
/ran/, the word for “gold”), it’s given a placeholder vowel, which 
appropriately looks like a little zero: 


OOF 0 a 
oY N oR 


You can even get words where every syllable has a placeholder 
of some sort, as in /ain/ (meaning “a school of fish”): it’s broken 
into /a/, /i/, and an extra /n/; the first two each get a placeholder 
consonants, and the last bit gets a placeholder vowel: 


o 2 0 a 
Pate NX OX 
Pa j 


5. Encoded as new characters. 


Now, there are 10 vowels in the language plus the placeholder 0 for 
null vowel; and there are 23 consonants plus the placeholder X for 
null consonant. So in the above system, there’s a maximum of 11 * 
24, or 264 possible written syllables. We could make an encoding 
based on 264 codepoints (i.e., slots in the character set), where we 
encode each whole syllable at once. This way makes sense because 
that’s how they’re drawn on the screen, so when you want to draw 
the di in the word divehi, the di is encoded as just a single code- 
point, and you fetch the font character for that. 

An alternative is to save each element (as d is, and as i is) as a 
character on its own, in a character code of its own. This is useful 
in that it reflects how you type, a letter at a time; and if you want 
to change an element, you shouldn’t have to re-key the whole syl- 
lable, but should be able to just hit delete and change one element. 

It so happens that Unicode’s representation of Divehi (in the 
character codes 0x0780 to 0x07B0) is the latter: divehi is repre- 
sented not as a character di, a character ve, and a character hi; but 
as six characters: 


0x078B = Divehi letter "d" 
0x07A8 = Divehi letter "i" 
0x0788 = Divehi letter "v" 
Ox07AC = Divehi letter "e" 
0x0780 = Divehi letter "h" 


0x07A8 = Divehi letter "i" 


Don’t let the issue of writing direction here confuse you: a file con- 
sisting just of the word “divehi” would start with the byte sequence 
for “d” and end with the byte sequence for “i”. The fact that Divehi 
is written “backwards” is just a matter for display on the screen; 
Unicode doesn’t make an issue of this, and encodes things “in log- 
ical order’, as it’s called. (The reader is invited to consider whether 
all alternatives are illogical orders.) 


P56 = www.tpj.com 


So if someone emails you in Divehi saying simply “ran!” 
(“gold!” — maybe it’s a grizzled prospector staking claim there in 
the Indian Ocean), that would be encoded as five characters: 


0x0783 = Divehi letter "r" 
0x07A6 = Divehi letter "a" 
0x0782 = Divehi letter "n" 
0x07B0 = Divehi letter null vowel 
0x0021 = Exclamation mark 


In an ideal world, you’d get that email, and when it made its way 
to your mail program and to your screen, it would look like this: 


lov 


ro 


But unless your mail program (and its OS) knows how to deal with 
Divehi — which includes having the fonts, knowing how to compose 
the vowels over/under the consonants, as well as going right to left, 
then you’re more likely to see this: 


If that’s all we can see, we’re left wondering what on Earth is meant 
by email consisting of an inscrutable four-character word and an 
exclamation point. Or, maybe the programmer of the mail program 
was clever, and he has his program show undisplayable characters 
using their character codes: 


[0783][074.6][0782][07B0}! 


While this doesn’t exactly lose any information, it doesn’t really 
blaze with significance either, unless we have a Unicode character 
chart on hand. I do have a Unicode character chart — but since it’s 
the size and weight of a large Encyclopedia Britannica volume, it’s 
a bit hard to imagine keeping it “on hand” wherever you go. 

If we’re using a system that can’t handle all of Unicode, that 
maybe can’t be trusted with anything but US-ASCI, it’d be nice if 


see the Divehi word expressed in Latin letters, as plain old “ran!”. 
That’s what I wanted Text::Unidecode to do. And if all the world’s 
scripts were as simple as Divehi seems so far, then writing 
Text::Unidecode would have been barely a few day’s work. But it 
turns out that not even Divehi is really as simple as that. 


When the Going Gets Messy, 
the Messy Turn Pro 


Writing a program to “parse” Divehi characters and spit out 
Roman letters (what we in the biz call “transliteration”) seems 
merrily simple so far. But it gets stranger. 

Previously, I said that there are 264 possible written syllables in 
Divehi, 11 * 24, 11 for the 10 vowels plus the null vowel, and 24 
for 23 consonants plus the null consonant. The astute reader might 
have realized that this includes the possibility of a syllable consist- 
ing of a null consonant and a null vowel. And, a syllable consisting 
entirely of a placeholder consonant for a placeholder vowel seems 
the sort of thing that could never actually happen. Sometimes, how- 
ever, things that don’t actually occur are allowed to exist in code 
tables so that there aren’t any gaps in the lookup grid that says “this 
consonant plus this vowel makes this displayable pair’. But the 
shocking truth is that the Divehi written syllable consisting of null 
consonant plus null vowel actually does occur — in fact it has two 
meanings. This is where things get messy. 

The first meaning of this null syllable is to express a sound that 
has no letter of its own: the glottal stop sound. English has this 
sound between the two vowels in the interjection “uh-oh!”; but in 
Divehi it occurs in normal words, like “bo’”, which means “frog”, 
or “hurihakame’” which means “everything” — although it may 
seem ironic that the word for everything ends in a double nothing: 
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The second meaning of the null syllable, is to make the following 
consonant last longer. Long consonants (“geminates” in linguistic 
jargon) are pretty rare in English; the closest we come is the 
double-t sound in “cat tail”. But they’re common in many lan- 
guages (Italian, Finnish), and come up plenty in Divehi, in words 
like “ba-ppa” (‘“‘father’’) or “ta-yya-ru” (“ready”): 


Now, a program that reads the Unicode encoding of this (taXOydru) 
should presumably turn it into something like “ta-yya-ru’”’, doubling 
the following consonant (here, a “y”). And where the X0 syllable 
occurs at word-end, it should be replaced by some good symbol for 
the glottal stop sound. The apostrophe will do for that, since it’s not 
otherwise in use in Divehi script. 

Another way to express this idea, is that XO and a consonant 
should turn into two of that consonant (XOy to yy); other XOs should 
turn into an apostrophe; and otherwise X and 0 just delete. This is 
a snap for regexps: 


s/X0(\w)/$1$1/g; 
s/X0/'/g; 
s/(\w)/$Divehi2roman{$1}/g; 


..except that we can go use the real Unicode characters: 


# \x{0787}\x{07b0} is "XO", the null syllable 

# \p{InThaana} is \w for just Divehi characters 

# ("Thaana" is the official name of the Divehi script) 
# See perldoc perlre for more about \p{...} 


s/ \x{0787} # the null consonant 
\x{07b0} # the null vowel 
(\p{InThaana}) | # and some letter 

/$1$1/9x; 


s/\x{0787}\x{07b0}/'/g; 


s/(\p{InThaana})/$Divehi2roman{$1}/g; 


Then we just make sure we’ve filled out 4DivehiZroman with 
things like: 


"\x({0786}" => "k", # Divehi "k" => Roman "r" 
"\x{0787}" => "", — # Divehi null consonant => nullstring 
"\x{0788}" => "v",  # Divehi "v" => Roman "v" 
"\x{0789}" => "m", # Divehi "m" => Roman "m" 


"\x{07A6}" => "a", # Divehi short a => Roman "a" 
"\x{07A7}" => "A", # Divehi long a => Roman "A" 
"\x{07A8}" => "i", d# Divehi short i => Roman "i" 
"\x{07A9}" => "I", # Divehi short I => Roman "I" 


"x 07b0}" => "", = # Divehi no-vowel => nullstring 


This constitutes a full working transliterator program, built from 
three regexps and one hash', which does a fine job of turning 
Unicode text in Divehi into US-ASCII. The fact that the Divehi, in 
proper script, would have been written right to left, with vowels 
superimposed on the preceding consonants, doesn’t show up in the 
Unicode representation, so our program doesn’t need to deal with it. 

As we consider doing the same for the dozens of other scripts in 
Unicode, we face the unpleasant news that Divehi, for all this 
strangeness with null syllables, is uncommonly straightforward as 
writing systems go. What took three regexps for Divehi (after a bit of 
research), could take a dozen for Hindi (which is partly like Divehi, 
but partly not). As for Thai, the transliterator would have to guess at 
syllable and word boundaries, since Thai doesn’t normally mark 
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them (yesitallrunstogether!). However, you need to know where they 
are in order to know which way to transliterate some characters. 

And it gets worse. My Library of Congress ALA-LC 
Romanization Tables reference for Arabic goes on for pages and 
pages. It notes, for example, that one character (Unicode 0x0629, 
called “teh marbuta”, which looks oddly like a “6”) is to be translit- 
erated as “h” when it’s on nouns that are indefinite or preceded by 
a definite article, or as “t’” when it is on construct state nouns, or as 
“tan” when it’s an adverb suffix. I have not the faintest idea what 
the “construct state” is or how to identify it, or how to tell an indef- 
inite noun or an adverb from any other kind of word in Arabic. Iam 
rather sure, however, that it cannot be done with a mere regular 
expression, and that is not something I say lightly! 

In short, it was looking as if producing a system that could take 
Unicode text and spit out US-ASCII romanization, was going to 
involve phenomenal amounts of work. Some scripts are simpler 
than Divehi, but many are much more complicated. The situation I 
was facing is exactly the sort of thing that programmers have in 
mind when they talk about the “eighty-twenty rule”. 


The Eighty-Twenty Rule 

With writing systems and computer encodings of them, things 
are pretty straightforward most of the time, but still manage to get 
a bit messy some of the time, and very messy every now and then. 
Most of the problem can be treated with a cheap hack or two, but 
to deal with the rest of the problem, you have to write code that is 
longer and introduces whole new levels of complexity into your 
program. In other words, to pick some favorite arbitrary numbers, 
you can deal with 80% of the problem by writing only about 20% 
of the code (or expending 20% of the time or effort) that it would 
take to treat the whole problem. 

A non-linguistic example of this is parsing addresses out of 
“From:” lines in email headers. This is a notoriously and point- 
lessly hard task to do “by the book” (where “the book” is RFC 
2822). However, a quick look at my mail spool file shows that you 
get 57% of the addresses parsed correctly if you just look for lines 
matching the pattern From: Their Name <user@host>, with a 
fairly constrained idea of what can be in user or host. There can be 
no spaces, no parens, no backslashes, no quotes, no \@’s or anything 
else that actually is RFC2822-legal. Add quotes to that pattern, as in 
From: "Their Name" <user@host>, and you get another 25%. 
Another 11% is gained by From: user@host, and it’s sharply 
diminishing returns from there. From: user@host (Their Name) 
is another few percent, and after that it’s off into hard-to-parse and 
mercifully rare things like From: Pete(A wonderful \) chap) 
<pete(his account)@silly.test(his host)>. 

So, if you can make do without total coverage, your time is 
probably better spent with just a simple regexp to match the most 
common formats. Although generally, your time is best spent skim- 
ming CPAN and the Perl FAQ first, to see whether someone’s done 
all the work for you already. For problems where that method doesn’t 
turn up anything, you must decide whether you can get away with 
a quick hack that does most of the work. Doing something by the 
book is almost always the right thing to do for keeping the code 
maintainable, extensible, and debuggable. But if doing the right 
thing means the job will take impossibly long, then it may be time 
to look into doing the wrong thing. 

This is exactly the situation I faced when writing 
Text::Unidecode; I figured out that many scripts (like Cherokee, 
Amharic, Greek, Coptic, Cyrillic, Armenian, Georgian, Yi, and 
Korean hangul, to name a few) were almost no problem at all, and 
could be translated a character at a time by looking in a hash that 
said that thus-and-such Cyrillic character is to be represented by 
“b”, thus-and-such Cherokee glyph is to be represented everywhere 
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by “la”, and so on. Some scripts, like Divehi and the dozen Indic 
scripts (Hindi, Bengali, Telugu, Burmese, etc.), could probably be 
tackled by a few regexps, with some recourse to advanced regexp 
features such as lookahead assertions. But it gets tricky — as the 
Indic scripts get odder, I probably would be able to learn enough of 
the language to make sense of its writing system in just a few days. 
But it would take a serious investment of time, for each of these 
languages, to learn the language well enough to know whether my 
transliteration algorithm was doing something wrong. 

As I considered going toward harder scripts like Thai and Chinese, 
and then on to really hard (but also really important) writing systems 
like Japanese, Hebrew, and Arabic, it looked as though each would 
require at least months (and probably years) of effort. I like writing 
open source software, and I like learning languages, but any project 
that would require me to become literate in Arabic, Hebrew, and all 
the major languages of Asia clearly needed some scaling back. If 
nothing else, by the time I finished this (i.e., in the distant future), 
Unicode would probably finally be well supported, at which point few 
people would need anything like Text::Unidecode. 

One way to tackle this would be to do what I could with the 
more straightforward writing systems (Cyrillic, etc.), then look for 
existing algorithms for the harder languages (e.g., using some of 
the great work that’s already been done toward automatic analysis 
of Japanese), and encourage friends and friends-of-friends to write 
algorithms for the languages they personally know well. (This 
alone would cover a good six or seven main languages of India.) 
But this would still leave great big gaps; I would run into a good 
number of languages (probably including Georgian, Syriac, Coptic, 
and Mongolian Old Script) where I would find no one to advise me 
and no texts in Unicode to test my transliterator against. In short, 
doing the right thing either alone or collaboratively would either 
take a long time, or still be shoddy, or both. 

So, I decided that what was feasible was a (relatively) quick 
hack — a transliteration algorithm whose view of things was inher- 
ently too simple to work right, but which would still work well 
enough most of the time, and which could still be done in a rea- 
sonable amount of time. That would mean that I’d have time to 
cover all of Unicode. People could (and will!) still produce smart 
transliterators for any language that they wanted done properly, but 
Text::Unidecode could take up the slack. 


Internals of Text::Unidecode, 


and Surveying the Damage 


I decided that Text::Unidecode should be just a wrapper around 
this one operation: 


s/([*\x00-\x7#])/$Unicode2Ascii{$l}/g; 


This simply replaces every character above Ox007F with 
$Unicode2Ascii{that character}, without any regard to the con- 
text. This approach kept Text::Unidecode from being about complex 
rules formalized as dozens of regexps, and made it just about consid- 
ering each Unicode glyph and guessing the one best way to translit- 
erate it. That worked just fine for “straightforward” scripts (Greek, 
Cherokee, Cyrillic, etc.), but let’s consider how that works for Divehi. 
We figured out the null consonant X and the null vowel 0 are, by 
themselves, basically just placeholders, and the only reason they’re 
needed is because in written Divehi you can’t have a vowel on its own 
or a consonant on its own. That would make us want to say that the 
Uni code2Ascii entry for each should just be “”, an empty string. But 
then the special pair XO (meaning glottal stop, or meaning to double the 
next consonant) would disappear without a trace, and baXOpa, boX0, 
and taXOydru would come out as “bapa”, “bo”, and “tayAru”; whereas 
we’d prefer something more like “bappa”, “bo””, and “tayyAru”. 
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After much consideration, I decided that the way to do this, for 
context-insensitive Text::Unidecode, would be to have the null 
vowel 0 just delete, but have the null consonant X be replaced with 
an apostrophe. To illustrate: 


Input in Divehi letters | Unidecode output 


divehi divehi 
ranX ran 
baXOpa ba’pa 
boX0 bo’ 
taXOy,ru ta’yAru 
xe e 
XiXa La 


My criterion here was that I tried to imagine not whether someone 
who knew no Divehi could make sense of a single word in this tran- 
scription, but whether someone who did know Divehi could make 
sense of several sentences of this. I figured that while the above 
system produces the completely superfluous apostrophes in “’e” 
and ‘“i’a”, people will not see any clear meaning for them. They 
will see that if you ignore those apostrophes, the word (“e” or “ia’’) 
makes sense in context. In cases where the apostrophe is what’s left 
of an X0 as in “bo’” and “‘ba’pa”, people should be able to infer its 
function(s) from context. And after a line or two, it will probably 
dawn on the Divehi reader that the apostrophe is just a stand-in for 
the Divehi letter for null consonant. 

This means people may have a bit of work to do the first time 
they see Text::Unidecode output, but it’s always work to read text 
in an alphabet that you’re not used to reading it in. In cases where 


the output actually gets a bit mangled, people will have a harder 


i 


(rock) 


The stakes in this game are high — your productivity, 
your effectiveness, and your ability to remain on the 
cutting edge. At Stonehenge Consulting Services, 
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Perl training to corporations around the world. Our 
courseware and presentations are easy to follow, 


funny, and produce satisfied customers. 
Stonehenge Consulting Services delivers top-to-bottom 
service in providing Perl support services, like training, 


documentation, and software development including 
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time, but people are generally pretty good at figuring out what 
things mean from context, even if it’s distorted. In any case, it’s 
alas better than the entire message being visible only as “???? 


Indic scripts are a good example of how sonnet: a bit more 
complex still generally survives the mayhem of Text::Unidecode’s 
algorithm. Indic scripts are basically (in the encoding) like Divehi, 
except that vowels don’t need a null consonant, and you don’t 
bother writing the short “a” vowel. So, if you have a consonant that 
has no vowel (or no null- en after it, it must have an “inherent” 
(implicit) short /a/. So, for example, in the Malayalam script, the 
word “malayalam” is encoded, as mlydlm0. (The final 0 null vowel, 
really character OxOD4D, is there to keep the final m from being 
read as “‘ma’”.) A context-sensitive algorithm could insert “a” 
characters as appropriate, but my Text::Unidecode one-context- 
fits-all approach has to have one representation for the Indic m 
codepoint, in spite of the fact that sometimes it means “m” and 
sometimes it means “ma”. 

I observed that most of the time, Indic script m means just “m 
not “ma”, suggesting that I should transliterate it as “‘m’”. For the 
cases where it really did mean “ma”, people would usually be able 
to infer that something was missing, and then be able to imagine 
what it was. So, if you spoke Malayalam and you saw a word like 
“mlyaalm” in Roman script, you would know that “mly” wasn’t a 
possible way to start a word in your language, and you’d infer that 
something was missing. Context, or even the barest knowledge of 
the normal written form of the language, would lead you to infer 
that it’s an “a” 


IN THE PERL GAME 
ROCK WINS EVERY TIME 
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design review and code review to reduce maintenance 
costs and improve the quality of deliveries. Our team 
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Conversely, if I had decided that Indic script m should be repre- 
sented as “ma”, mlyalmO would come out as “malayaaalama’”. 
That’s only slightly strange, but it’s very bad for syllables with dif- 
ferent vowels. For example, the personal name Abhijit is encoded 
as aBijitO. If we assume every consonant has no “inherent a” (so m 
is “m”, not “ma’”), then that comes out “abhijiit”, or “aBijiit”, 
depending on how we decide to Romanize the “special b” — as 
“B” or as “bh” (I settled on the latter, since it’s more standard). But 
if we decide that every consonant did have an “inherent a” (so m is 
“ma” not “m’”), aBijitO comes out as “abhaijaiita”, which is far 
afield of how anyone thinks of it or pronounces it. The grand les- 
son here is tht if y lv Ittrs ot, ppl cn stll make sense of it, but ifa 
yaou gao araounada inasaeratainaga laetataerasa, the result is pretty 
confusing. This rule bodes well for Text::Unidecode output for 
Arabic and Hebrew, where the normal written form of the language 
is missing vowels. 


The Special Chinese Problem 

Most of what I’ve written so far assumes that each Unicode 
symbol has pretty much one or two closely related pronunciations, 
which we then pick from based on context. This assumption 
falls apart when we get to Korean, Japanese, and Chinese — the 
languages that use Chinese characters (or “Han” characters, in 
Unicode jargon). Consider, for example, this character, Unicode 
Ox4E0B: 


= 


If the text is in Korean, this character will be pronounced 
“ha”, and should probably be transliterated as such. If it appears 
in Chinese text, a Mandarin speaker will pronounce it “xia”, and 
a Cantonese speaker will pronounce it “ha”. If that character is 
in a Japanese text, it will be pronounced as “shita’”, “shimo”, 
“moto”, “ka”, or “ge” — and which it is, depends on complex 
contextual factors. 

In Text::Unidecode’s table of what transliterates as what, there 
is no allowance for any kind of context, even contextual guesses 
about what language the text is in. had on hand the ““Unihan database” 
that says what the pronunciations are for most of the Chinese 
characters in Unicode, for all the languages that use each particular 
character — but I could pick only one pronunciation per character. 
Somebody had to lose. These are the things I considered: 


More people speak Chinese than Japanese. 

More people speak Japanese than Korean. 

Of people who speak Chinese, most can understand Mandarin 
pronunciations, as it’s the national standard dialect of mainland 
China. 

The Korean and Japanese pronunciations are often derived from 
the Chinese pronunciations. It rarely if ever went the other way. 
A Japanese or Korean person is more likely to have studied 
Chinese (meaning modern spoken Mandarin), than a Chinese 
person is to have studied Japanese or Korean. 

If the Japanese, Korean, and Mandarin pronunciations are rather 
different, there is a small chance that a Japanese or Korean reader 
will be able to understand the Mandarin pronunciation, but nearly 
no chance that a Chinese person will be able to understand the 
Korean or Japanese pronunciations. 

* Most importantly, if you take the Mandarin pronunciations as the 
standard, then context-insensitive transliteration of Chinese text 
works almost perfectly. Japanese needs more context sensitivity, 
and if you take the most common Japanese pronunciation as the 
standard, context-insensitive transliteration of Japanese text 
doesn’t work very well. 
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In other words, if I took Mandarin as the canonical pronunciation 
for Chinese/Japanese/Korean Unicode characters, it’d do pretty 
well for a whole lot of text for a whole lot of people. If I chose 
Japanese as the canonical pronunciation, it’d work less well, and 
would be good for fewer people. While Text::Unidecode is not nec- 
essarily a majoritarian project, I do like to please the most people 
while frustrating the fewest. 
So, that’s why when you enter this: 


use Text: :Unidecode; 
print unidecode( 

"\xX{5317}\x{4EB0}" 

# Those are the Chinese characters for the 

# name of the capital city of mainland China. 
My 


You get this Mandarin Romanization: 
Bei ding 


Instead of Japanese or Korean attempts at transliterations of those 
characters’ pronunciations (“Kita Miyako” and “Pwuk Kyeng”, 
respectively). 


Future Developments 

I don’t think of Text::Unidecode as being the last word on the 
subject of transliteration — not by a long shot. It’s a cheap hack 
whose coverage of world languages is very broad but very very 
thin. It does pretty well with most alphabets, is so-so with Indic 
scripts, and its solution to Chinese/Japanese/Korean isn’t good, but 
is the least bad. I hope people will write smart transliterators for the 
Indic scripts, Thai, Japanese, and whatever else they feel a need for; 
and I hope they’Il put them in CPAN. But for whatever languages 
are left over, you can always fall back on Text::Unidecode. 
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. Incidentally, you may have noticed that while I’ve been representing long a’s 
as “A”, as in “hurihdkame’”, the above part of a 2Divehi2roman would give us 
a capital instead, as in “hurihAkame’”. Since Text::Unidecode is, apparently, 
for use in systems that lack full Unicode support, I decided to play it safe 
and assume that they don’t even have Latin-1 support either — so I use only 
US-ASCII characters, involving no Latin-1 characters like “a”. In this case, 
since there’s no uppercase/lowercase distinction in Divehi script, nothing on 
input will give us an “A”, so we’re free to use that character for long a’s. 
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a HELPING THE DISABLED 


O’Reilly Open Source Convention. He discussed a program 

that he had written for his friend, Sue Simpson. She is a mute 
quadriplegic, and Jon’s program allows her to “speak”, read online 
texts, and browse photos. The program is remarkably clever in its 
use of Perl/Tk to overcome various obstacles in the creation of an 
accessible GUI. 

This program wasn’t Jon’s first volunteer effort; he earlier devel- 
oped a complete administration program for a school and a Web 
store for a community project. A long-time C programmer, Jon 
found Perl five years ago and hasn’t gone back since. Given the 
rapid development and assorted features he needed for Sue’s program, 
it’s a decision he has never regretted. 

Jon met Sue in 1986 through a mailing list request to help her 
install software, and they became friends. For years he helped con- 
figure and program a device called the Express 3, a rectangular 
array of LEDs to which Sue could point with a light pen attached to 
her glasses. As that device became obsolete, newer devices were 
found to be too expensive, and other options did not seem to fit her 
needs. Jon began writing a program specifically tailored for Sue that 
would allow her to easily type and communicate. Since then, the 
program has continued to evolve into a full user environment. See 
Figure 1. 


Je Bjornstad recently presented a “Lightning Talk” at the 2001 


A Text Interface for Speaking 

The first feature Jon created was an online keyboard (Figure 2) 
to allow Sue to input data. Because she cannot click a mouse 
button, the ability to select onscreen elements with movement pre- 
sented an initial challenge. Jon employed the rarely used Enter and 
Leave motion callbacks for a Label, in combination with a timer 
functionality. The following simplified program illustrates this 
functionality and is shown in Figure 3. First, a window (the 
Ma inWindow object class) and a label to contain the value of variable 
$msg are created: 


use Tk; 


my $mw = MainWindow->new(-bg => "white"); 


my $msg = ""; 
$mw->Label ( 
PACKAGES USED 
Tk Tk::JPEG 
Win32::Sound Win32::OLE 


AnyDBM File, SDBM_File 
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WITH PERL/TK 


with Jon Bjornstad 


-textvariable => \$msg, 


-font => "Arial 18 bold", 

-width =) 30, 

-bg => "skyblue", 

-anchor => 'w', 

-relief => ‘ridge’, 
)->pack; 


Next, a subroutine show is defined, which will later serve as a call- 
back function. The action here is dependent on the first parameter 
to this function, stored as $]et: 


sub show { 
my $let = shift; 
if ($let eq "Quit") { 
exit; 
} elsif ($let eq “Clear") | 
$msg = ""; 
} elsif ($let eq "Del") { 
chop $msq; 
} else { 
$msg .= $let; 
} 
} 


Having defined the necessary elements and callback, we can popu- 
late the window with labels. Note that we are detecting the Enter 
and Leave events for each, and binding them to a subroutine reference, 
which in turn calls the show() callback: 


my ($lab, $timer); 
for my $let (qw(A BC 0 E Del Clear Quit)) { 
$lab = $mw->Label ( 
“text, =e" Set ™, 
-font => ‘Arial 18 bold’, 


-bg => length($let) > 1? ‘violet’: ‘lightgreen', 
-relief => ‘ridge’, 
)->pack; 
$lab->bind("<Enter>", sub { 
if ($let eq "Del") { 
$timer = $mw->repeat(500, [ \&show, $let J); 
} else | 


$timer = $mw->after( 500, [ \&show, $let ]) 
} 
}); 
$lab->bind("<Leave>", sub { 
$timer->cancel; 
He 
} 


After looping to create the buttons (labels), the timers are imple- 
mented with the MainWindow method calls. When the mouse hovers 
over any label other than “Del”, a timer after() function is called, 
which will delay the selection of that label. Once the time has 
passed, show() is called, and the action appropriate for the value of 
$let will execute. 

This final piece of code starts the main Tk code loop: 


MainLoop; 
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Using code similar to this, Sue is able to select any screen ele- 
ments by hovering over them for an interval with no required 
mouse buttons. In practice, Sue uses a “head mouse” 
(http: //www.orin.com/access/headmouse/phm.htm) fixed to her 
glasses, rather than the mouse. 


Word Prediction 


Typing an entire word can be time-consuming and laborious. 
For this reason, most text-input interfaces of this type perform a 
word prediction (Figure 4). As the letters are selected, the list 
shortens to reflect the matching words. Consider that the letters 
“esp” prefix 48 words in an unabridged dictionary, but the only 
ones commonly used are: 


ESP 
especially 
Esperanto 
espionage 
espousal 
espresso 
esprit 

By displaying the list after each letter is selected, Sue can select 
the word she wants after entering only a few letters. Jon decided to 
implement this feature with words Sue had typed previously, order- 
ing the list according to the number of times she had used them. 
With the following excerpt, the list is read from or written to a plain 
text file rather than a DBM file (to facilitate manual editing): 


Figure 1 Sue 


Figure 2 Keyboard 


PERL TK IS COOL! 


Speak Faster Slower Abbr Beep Quit 
AddW Define Browse File Color 


Clear DelWord 
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my %times; # key is the word 
# value is number of times it had been used 


sub read_words { 
my ($word, $freq); 
open IN, "words.txt" or die "cannot open words.txt\n"; 
while (<IN>) { 
chop; 


($word, $freq) = split; 
$times{uc $word} = $freq; 
| 
close IN; 
} 
sub by_freq | 
$times{$b} <=> $times($a} or 
$a cmp $b; 


} 


sub write_words { 
open OUT, "Dwords.txt" or die "cannot open words.txt\n"; 
for my $w (sort by_freq keys %times) { 
print OUT "$w\t$times{$w}\n"; 
} 
close OUT; 
} 


The program can then search the list for words. For example, to 
search for words beginning with a prefix: 


Figure 3 Diagram 1 


Figure 4 Prediction 


ented Ls 


Speak Faster Slower Abbr Beep Quit 
AddW Define Browse File Color 


Clear DelWord 
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my (@words); 


my $maxwords =°10; # actually 11 in all 


for my $1 (0 .. $maxwords) { 
$words[$i] = ""; 
$mw->Label ( 
-textvariable => \($words[$i]), 
# ... all other attributes 
)->pack; 
# ... Enter and Leave callbacks as above 
} 


sub clear_words { 
for my $w (@words) { 


Si = Oy 
} 
| 
sub fill_words { 
clear_words ; 
my $prefix = Msg->last_word; 
my $n = 0; 


for my $w (sort by_freq 
grep { /*$prefix/ and \ 
$times{$_} > 1 } 
keys %times) | 
$words{$n] = $w; 
return if ++$n > $maxwords; 


} 


The labels here are created with a 
textvariable attribute pointing to one of 
the elements in the @words array. The func- 
tion fill_words() is called after each 
letter is added to the message window. 
Msg->last_word returns the last blank- 
separated word in the message window, 
and this word is then used as a prefix to 
search for subsequent matching words. 
Note that all the keys in the 4times hash 
are considered, but only those matching 
the prefix are included: 


grep { /*$prefix/ and $times{$_} > 1 } 


A further restriction is to include only 
words that have been used more than once, 
which helps eliminate typos and mis- 
spellings. The resulting list is sorted by 
frequency, so the words near the top will be 
the ones most often selected. As a word is 
used more frequently, it will “rise” in the 
list to be more readily chosen. 

All these features combine to allow Sue 
to quickly input words to form sentences. 
Once a sentence is entered, the program 
is able to speak it with a text-to-speech 
synthesizer. The synthesizer is a freely 
downloadable Microsoft program, which 
interfaces easily with Perl. 


Texts and the Dictionary 

Another feature of Sue’s program is 
the ability to read online texts down- 
loaded from the Gutenberg project 
(http: //www.gutenberg.org). She can 
browse the texts as a hierarchy, jump 
quickly to sections, and bookmark pages. 
See Figure 5. 
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Soon after Jon created this functionality, Sue came across this 
passage while reading The Adventures of Tom Sawyer by Mark 
Twain,: 


“... he uncovered an ambuscade, in the person of his aunt; 


.. her resolution became adamantine in its firmness.” 


She turned to her daughter and asked, “Do you think Jon could get a 
dictionary into this thing?” Jon accepted the challenge and devised a 
solution whereby pausing over words in the text would pop up the dic- 
tionary definitions (Figure 6). Starting with the 1913 Webster’s 
Dictionary (also from the Gutenberg project), and then using the 
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href="http://www.mso.anu.edu.au/~ralph/OPTED">OPTED project } 
(which formats the text in consistent HTML, one word per line), oe ov 
Jon converted the dictionary into a DBM file for quick access. An } 


excerpt from the dictionary looks like this: 


<P><B>Abacus</B> (<I>n.</I>) A table or tray strewn 
with sand, anciently used for drawing, calculating, etc.</P> 
<P><B>Abacus</B> (<I>n.</I>) A calculating table or frame; 
an instrument for performing arithmetical calculations by 
balls sliding on wires, or counters in grooves, 
the lowest line representing units, the second line, 
tens, etc. It is still employed in China.</P> 
<P><B>Abada</B> (<I>n.</I>) The rhinoceros.</P> 
<P><B>Abaddon</B> (<I>n.</I>) The destroyer, or angel of the 
bottomless pit; - the same as Apollyon and Asmodeus.</P> 


To index the 17 MB of data with DBM files, the following snippet 


was used: 


Use SEPICT; 
my (4dict, $last, $word); 


unlink <*_dict.*>; # tidy up any old ones 


for my $let (‘a' .. 'z') { 
dbmopen ’dict, "${let}_dict", 0777; 
open IN, “wb1913_$let.html" or 
die "cannot open wb1913_$let.html\n"; 
$last = ""; 
while (<IN>) { 
next unless ($word) = m#*(.*)#; 
next if $word eq $last; 
$dict{$word} .= (tell(IN)-(length)-1) ." "; 
# -1 because of *M at the end 
# of the lines 
$last = $word; 


Perl2Exe 


Convert your Perl scripts to stand alone 
exe files. 
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This creates a separate DBM file (.pag and .dir) for each letter 
of the alphabet, with the key as the word and the value as the 
concatenation of the seek addresses within the .html file con- 
taining the word’s definitions. With the DBM files in place, it is 
easy to quickly provide a define() subroutine to display the 
definition for $word. 


my dict; 
$word = ucfirst lc $word; 
my $f1 = 1c substr($word, 0, 1); 
dbmopen %dict, “dictionary/${fl}_dict", 0666 
or die "cannot dmbopen ${fl}_dict: $!\n"; 
open IN, “dictionary/wb1913_$f1l.html" or 
die "could not open dictionary/wb1913: $!\n"; 


my $addrs = $dict{$word}; # the seek addresses 
return unless $addrs; 
insert("$word\n"); # insert into a text widget 


for my $a (split /\st+/, $addrs) { 
seek(IN, $a, 0) or die "cannot seek to $a in dict for $word $!\n" 
while (defined($line = <IN>) and 
$line =~ m% 
\<P><B>$w</B>\ # must match the word 
ACG AIAN # part of speech into $1 
(.*)</P>$4x) { # the definition into $2 


insert("\n $1"); 


my $def = $2; 

my $i; 

while (length($def) > 50) { # wrap lines at 50 chars 
$i = rindex($def, ' ', 50); 
insert("\t " . substr($def, 0, $1) . "\n"); 


$def = substr($def, $itl); 
} 
insert("\t $def"); 
} 
} 
close IN; 
dbmclose &dict; 


With the above, it is simple to truncate suffixes and add definitions 
for the resultant root word: 


Figure 5 Reader 
Suetenter 
Done Jump Mark Define Back 


Down, down, down. Would the fall NEVER come to an end! “I 
wonder how many miles I've fallen by this time?' she said aloud. 
‘I must be getting somewhere near the centre of the earth. Let 
me see: that would be four thousand miles down, I think--' (for, 
you see. Alice had learnt several things of this sort in her 
lessons in the schoolroom, and though this was not a VERY good 
opportunity for showing off her knowledge, as there was no one to 
listen to her, still it was good practice to say it over) '--yes. 
that's about the right distance--but then I wonder what Latitude 
or Longitude I've got to?’ (Alice had no idea what Latitude was, 
or Longitude either, but thought they were nice grand words to 
say.) 


Presently she began agai. ‘I wonder if I shall fall right 
THROUGH the earth! How funny it'll seem to come out among the 
people that walk with their heads downward! The Antipathies, I 
think--' (she was rather glad there WAS no one listening, this 
time, as it didn't sound at all the nght word) *--but I shall 
have to ask them what the name of the country is, you know. 
Please, Ma'am, is this New Zealand or Australia?’ (and she tried 
to curtsey as she spoke--fancy CURTSEYING as your'e falling 
through the air! Do you think you could manage it?) “And what 
an ignorant little girl she'll think me for asking! No, it'll 
never do to ask: perhaps I shall see it written up somewhere.' 
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my $w; 

for my $suffix (qw(s ly ed ing es ness)) { 
$w = $word; # restore the original 
define($w) if $w =~ s/$suffix$//; 

} 


The final step of the dictionary feature was to make it convenient 
for Sue to simply point the cursor at a word within the text and have 
the definition of that word appear. The tricky part was determining 
which word was under the cursor at any given moment. A careful 
perusal of the book, Learning Perl/Tk by Nancy Walsh gave several 
suggestions. Jon began by creating a Define Tk label, which exe- 
cutes the following code when activated: 


$lastw = ""; 

$firstTime = 0; 

$mw->bind("<Motion>", \&show); 
$readwin->configure(-cursor => "question_arrow"); 


Binding <Motion> events in this way should be the exception since 
they are activated by any mouse motion, causing many invocations 
of show(). Note that above, $readwin is the text widget containing 
the words we are reading. When the program switches to this 
“definition mode”, the cursor for $readwin is changed to a little 
question mark. This cursor is used to pass over words in order to 
select the word to define. 
Next, the show callback: 


sub show { 

my $e = $readwin->XEvent; 

return unless $e; 

my ($x, $y, $time) = ($e->x, $e-dy, $e->t); 

my $p = "\@$x,$y"; 

my $w = $readwin->get("$p wordstart", "$p wordend"); 

return unless length($w) > 3; 

if ($w ne $lastw) { 
$lastw = $w; 
$firstTime = $time; 

} elsif ($time - $firstTime > 1000) { 
$mw->bind("<Motion>", undef); 
$readwin->configure(-cursor => "arrow"); 
Dict->define($w); 


Figure 6 Dictionary 


Sue Center 


Done Jump Mark Define Back 


Down, down, down. Would the fall NEVES come to an end! ° 


Bah bases 


Latitude 
Extent from side to side, or distance sidewise from a given point 
or line, breadth, width. 
Room; space, freedom from confinement or restraint; hence, 
looseness; laxity, independence. 
Extent or breadth of signification, application, etc., extent of 
deviation from a standard, as truth, style, ete. 
Extent; size; amplitude; scope. 
Distance north or south of the equator, measured on a meridian. 
The angular distance of a heavenly body from the ecliptic. 
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The fancy footwork with XEvent gives us the exact (x,y) coordinate 
of the cursor and the time (in milliseconds) it passed over that spot. 
The get() method of the Text widget tells us what text is under a 
specific spot. With the wordstart and wordend modifiers, we can 
isolate the precise word under the cursor. 

A few “catch” rules were used to prevent unintended words 
from being defined, which included ignoring all words of three 
characters or less and using $lastw and $firstTime to keep 
track of how long a word is hovered over before activating the 
pop-up definition. 


The (Not So) Final Program 

The complete program also gives Sue the ability to change dis- 
play colors and browse photo albums of family and friends. Jon 
plans to add more features, including an X10 interface so that Sue 
can control her lights and television from within the program. 

Jon’s program is a good example of the flexibility of Perl. 
Because it’s Perl-based, the program runs equally well on UNIX 
and Win32. Development was rapid, with the base functionality 
taking only a week of Jon’s spare time. Updates are simple, and the 
program is easy to extend and maintain. Programming knowledge 
and creativity can go a long way, especially with tools as powerful 
as Perl. 

The entire program is now available on Jon’s Web site at: 
http://www. icogitate.com/~perl/sue/. 


Dan Brian is a software engineer at Verio, Inc. He likes cake. 


Jon Bjornstad is a certified Perl hacker, an amateur pianist, and tries to cultivate 
a quiet and open mind. 
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Stem Systems Releases Perl 
Networking Toolkit 


Stem Systems released Stem, a network application suite and 
development toolkit. According to the company, network manage- 
ment is enhanced by Stem’s collection of high-level modules that 
provide common network services and resources. You can create 
applications by connecting Stem modules together in a configura- 
tion, with little or no programming required. 

Stem includes support for clients/servers, network communi- 
cation, log file management, process management, multiplexing 
I/O, message passing, and remote or local runtime configuration. 
Stem comes with several example applications, including an 
inetd emulator created by a configuration of standard modules, 
which requires no extra coding. 

Stem is written in pure Perl and is Open Source and free to 
download and use. Go to http://stemsystems.com for documen- 
tation and to download the tarball. 

For more information, contact: Stem Systems, Inc., 521 Green 
Street, Iselin, NJ 08830; Phone: (732) 283-8700; Fax: (732) 636-8673; 
Internet: info@ stemsystems.com; WWW: http:/www.stemsystems.com. 


LokBox Software Announces 
InternetPeriscope 


LokBox Software announced the release of InternetPeriscope, 
its Internet server monitoring and auditing software. According 
to the company, InternetPeriscope includes features and tools 
that may be useful to systems administrators of small and large 
networks of Internet servers. The product includes monitoring 
tools and intrusion detection devices that can help manage intru- 
sion prevention and tracking, as well as hacker detection and 
identification. 

InternetPeriscope helps to monitor a network’s Web, DNS, 
SMTP, and POP (mail) servers for real-time and potential prob- 
lems, notifying systems administrators of any problems via a 
graphical interface. The security features analyze servers for hacker 
vulnerabilities, provide real-time detection of hacker intrusions, 
and maintain a history of intrusion attempts. 

For more information, contact: LokBox Software, 236 West 
Portal Ave. #769, San Francisco, CA 94127-1423. A 30-day trial is 
available from: http://www.lokboxsoftware.com. 


Palisade Releases PacketHound 2.0 
Palisade Systems, Inc. announced the release of version 2.0 
of the PacketHound Protocol Management Appliance, a network 
appliance that allows organizations to monitor or eliminate the 
use of emerging protocols and applications. According to the 
company, PacketHound can monitor, manage, or block the use 
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of Aimster, Code Red, DoubleClick, Gnutella, iMesh, Napster, 
Real Media 1 and 2 , Scour Media Exchange, ShoutCast, 
WebRadio, Windows Media | and 2, among others in addition to 
custom protocols on an organization’s network. PacketHound 
2.0 can block subdivisions of existing applications like Napster, 
which can now be blocked entirely or partially (if, for example, 
you wish to block uploads but not downloads). 

PacketHound 2.0 adds 60 new applications and protocols to the 
list of those it can manage. Included in this list are AOL Instant 
Messenger, FTP, HTTP, ICQ, Finger, KazaA, Telnet and POP, 
among others. The Code Red signature can be used to prevent 
infection of internal Microsoft Servers, and its logging and reporting 
capabilities can allow a systems administrator to find infected 
internal servers so corrective action can be taken. 

For more information, contact: Palisade Systems, Inc., 2625 
North Loop Drive, Suite 2120, Ames, IA 50010; Phone: (515) 296- 
5494; WWW: http://www.palisadesys.com/products/packethound. 


Naturetech Announces Portable SPARC 
Workstation 


Naturetech announces a notebook computer using technology 
from Sun Microsystems, Inc. The company’s new Ultra- 
NoteStation 777S is the first notebook computer powered by Sun’s 
UltraSPARC Ile microprocessor. According to the company, the 
Ultra-NotStation 777S runs the Solaris 8 operating environment, 
which brings GUI user-friendliness to the UNIX operating system. 
The 777S introduces new lows in size, weight, power consumption, 
and cost to the SPARC computer systems market and offers bene- 
fits of mobile computing to Sun’s user base. 

Starting with a 15-inch LCD display, the 777S includes the 
necessary I/O interfaces such as USB, serial port, VGA output, 
and Ethernet. Its design includes an 8X DVD-ROM drive, 
floppy disk drive, and up to two hard disk drives. The lithum ion 
battery provides up to four hours of battery life. Audio and 
video are also standard features providing CD-quality 16-bit 
audio and 24-bit color depth. 

For more information, contact: Nature Worldwide Technology 
Corp., No.1, Min-Chuan Street, Tu-Cheng Industrial Park, Taipei 
Hsien, Taiwan, R.O.C.; Phone: +886-2-22689901; Fax: +886-2- 
22689903; WWW: http:/Avww.naturetech.com.tw. 


Plumtree Releases UNIX Version of 
Plumtree Corporate Portal 

Plumtree Software announced the general availability of the 
Plumtree Corporate Portal 4.0 for Sun Solaris, the first version of 
the company’s software running entirely on the UNIX platform. 
According to the company, the release brings to UNIX a document 
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directory for indexing and categorizing documents, the Massively 
Parallel Portal Engine for assembling Web services in parallel, a 
library of Enterprise Class Gadget modules for integrating Web ser- 
vices from other applications, comprehensive security and adminis- 
tration tools, and a communities and collaboration framework. 

For the new Web user interface, the Plumtree Corporate Portal 
for Solaris features the Plumtree Java Presentation Layer, written in 
Java 2 Enterprise Edition (J2EE)-compliant JavaServer Pages. 
Customers with experience in Java can perform portal development 
in Java and customize the user interface without changing the func- 
tionality, lowering deployment costs. The Java Presentation Layer 
can run on either a Java servlet engine or an application server, such 
as BEA WebLogic or IBM WebSphere. The Java Presentation 
Layer will be the basis for the next-generation Web user interface 
for the Plumtree Corporate Portal for Windows, using ASP.NET 
and C#. 

For more information, contact: Plumtree Software, 500 Sansome 
Street, San Francisco, CA 94111; Phone: (415) 263-8900; Fax: (415) 
263-8991; WWW: http:/www.corporateportal.com. 


Automatos Announces Remote 
Diagnostics Suite 


Automatos Inc. announced the launch of the Automatos Virtual 
Systems Engineer (AVSE) product suite. According to the com- 
pany, the AVSE product suite is a set of automated management 
services that allow the user to identify operating systems and appli- 
cation-related performance problems. To achieve this automation, 
the services combine Automatos’s tools for performance analysis, 
capacity planning, real time monitoring, asset inventory/management, 
Oracle monitoring, and SQL Server monitoring. Users can connect 
to the AVSE portal via the Internet, download the specific agents, 
and access the services needed to manage their server problems. 

The AVSE operates by analyzing internal resources of different 
operating systems and applications. The agent will collect data 
(e.g., CPU consumption, memory usage, memory locks, procedure 
cache, connection counters, cache hit rates, and logs) and will 
send it securely to the AVSE backend. The information is then 
compiled and generated in the form of a report in pdf format. 

The AVSE product suite is compatible with servers running 
Windows NT/2000, Linux, Solaris, HP-UX, AIX, Oracle and SQL 
Server environments. 

For more information, contact: Automatos Inc, 19925 Stevens 
Creek, Cupertino, CA 95014-2358; Phone: (800) 887-7757; Fax: 
(408) 973-7231; WWW: http://www.automatos.com. 


SoftIntegration Launches Ch 2.0 

SoftIntegration has launched Ch 2.0. According to the company, 
Ch is a C virtual machine with classes that avoids compile/link/exe- 
cute/debug cycles for application development. It supports C90, 
POSIX, and socket/WinSock with more than 1,000 functions. 
Programs using complex type defined in C99 and complex 
classes in C++ can run in Ch without any modification. 
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Ch has many high-level extensions to C, including classes in 
C++, very high-level shell programming, cross-platform Ch applets, 
generic functions, string type, computational arrays for linear algebra 
and matrix computations, user-friendly 2D/3D graphic plotting, 
classes for Common Gateway Interface (CGI), and advanced high- 
level numerical functions. 

Ch is an alternative to other scripting languages and mathematical 
software packages for cross-platform shell programming, systems 
administration, regression test, real-time interactive computing, rapid 
prototyping, scientific numerical computing, and Web development. 
Program ch can also be used as a login shell. 

According to SoftIntegration, Ch bridges the gap between 
the C language and UNIX shell and is designed for both inter- 
active command interpretation and shell programming. Ch is a 
portable shell with consistent interface in Windows, Linux, and 
UNIX. When Ch is used in a single command mode, most fea- 
tures of Ch are the same as conventional UNIX shells and MS- 
DOS shell in Windows. Shell programs written in the common 
set of programming features of C and Ch can be readily com- 
piled to the native machine code for efficient execution. 

Ch for Windows 95/98/Me/NT/2000, Linux, Solaris, and HP-UX, 
is available. For more information, contact: SoftIntegration, Inc., 
216 F Street, #68, Davis, CA 95616; Phone:(530) 297-7398; 
Fax: (530) 297-7392; Internet: sales @softintegration.com; 
WWW: http://www.softintegration.com. 


WR@ Releases Reflection 9.0 

WRQ, Inc. announced the release of version 9.0 of its WRQ 
Reflection terminal emulation and PC/UNIX integration products. 
According to the company, enhanced security through Reflection 
Security Components 9.0 provides Kerberos, SSL, and TLS sup- 
port, which facilitate encryption and authentication to host systems. 
It also permits XDM-Authorization and integration of third-party 
SSH clients. Added support for XML provides businesses with 
a standard for capturing and exchanging data between PC 
XML-enabled applications and UNIX, Linux, and HP e3000 host 
systems. Linux Console support allows for ANSI colors, attribute 
color mapping, and keyboard and mouse mapping for Windows 
desktop users connecting to Linux hosts. 

Multiple X display support allows X Window sessions from 
multiple UNIX hosts to be displayed on a single PC with indepen- 
dent display configurations. Customization capabilities include a 
new setting search and printing commands, adding tailoring con- 
trols for Reflection for IBM users. New versions of the WRQ 
Reflection products are scheduled to be available worldwide by 
Fall 2001, which includes its terminal emulation products, 
Reflection for IBM, Reflection for UNIX and OpenVMS, and 
Reflection for HP, as well as PC/UNIX integration solutions, 
Reflection X and Reflection Suite for X. 

For more information, contact: WRQ, Inc. 1500 Dexter Avenue 
North, Seattle, WA 98109; Phone: (206) 217-7100; Fax: (206) 
217-7515; WWW: http:/www.wrg.com. 
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